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THE ANALYSIS OF VARIATION IN A NATURAL POPULA- 
TION OF LADY BEETLES 


Frank M. Srurtevant, Jr., Lyte D. CaLvIN AND ORLANDO 


From the Division of Biological Research, G. D. Searle & Co., Box 5110, Chicago 80, 
Institute of Statistics, North Carolina State College, Raleigh; 
and Cresap Biological Laboratory, Northwestern University, Evanston 


INTRODUCTION 


Biometric and taxonomic studies of quantitative characters in 
natural populations have led to the discovery of cases of continuous 
geographic variation, or clines, and of subpopulations displaying 
various degrees of reproductive and morphological intergradation. 
Because of the empirical fact that two or more natural entities may 
bear any degree of relationship along the spectrum of reproductive 
isolation, from complete intersterility to complete interfertility, regard- 
less of their absolute or partial geographic coexistence, only an arbitrary 
point can be established on that spectrum for the taxonomic convenience 
of division of such entities into subgroups. The polytypic species 
concept (Mayr, 1942, Chap. 6) is a function of great subjectivity 
when applied to geographically separated, and especially morphologically 
similar, populations. The danger lies in the necessity of classifying 
microevolutionary units for ease of handling and in the implicit assump- 
tion by many taxonomists that such classifications reflect the true 
biological order of nature. Many authors have vainly sought a species 
definition for lack of realization of this latter point (see discussion by 
Gilmour, 1951). 

One of the most objective methods for investigating the systematic 
relationships between evolutionary units is that of Womble (1951), 
even though some subjectivity may enter the picture in the weighting 
of various characters. Fisher (1936) has dealt with such weightings 
by the use of discriminant functions. In order to apply Womble’s 
differential systematics, a good deal of information first must be ac- 
cumulated on the distribution of points on the geno-, pheno-, or ecocline. 
The genocline is the most fundamental of the three and much work 
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has been pursued on geographic variation in the frequencies of certain 
genes. 

A prime requisite for genocline analysis is the characterization of 
the genetics and variation in a natural population. Shull (1944) has 
published the results of genetic studies on several populations of the 
lady beetle Hippodamia convergens Guer. This species of coccinellids 
is one of the most common of the lady beetles and is found in abundance 
throughout North America. One of its distinguishing characteristics 
is the presence on each elytron, or wing-cover, of six black spots or 
maculations. Mainly through Shull’s efforts, the heredity of the 
elytral maculations in this species and the genetics of interspecies 
relationships have been well-developed (Shull, 1949). A review of the 
literature on the genetics of the Coccinellidae appeared in an earlier 
paper by Shull (1943). 

The purpose of the present paper is to investigate the variation of 
a genetic character in a natural population of Hippodamia convergens, 
illustrating a method for the separation of overlapping distributions of 
phenotypes. 


THE SPOTLESS PHASE 


The elytral maculations of Hippodamia convergens vary greatly in 
size and may be wholly or partly absent, in which case the condition is 
known as the spotless phase. Since the phenotypes of the spotless and 
spotted phases overlap, Shull (1946) devised a method for their separa- 
tion. After many genetic crosses, he found that spotless beetles could 
be identified best by considering only the posterior three maculations, 
which, he observed, tended to be larger than the anterior three in 
spotted beetles and to show a greater frequency of absence in spotless 
beetles. An arbitrary unit for the measurement of the greatest dia- 
meter of a spot was established such that no maculation was scored as 
greater than four; therefore, no total of the three posterior spots could 
exceed 3 X 4 = 12. 

In Shull’s laboratory population, which had been derived from a 
single pair of Michigan beetles (Shull, 1951), it was found that the 
inheritance of spot diameters was such that a posterior total of 7.74 or 
less would divide, on the average, the spotted and spotless phases 
(Shull, 1946). It was further established that the spotless phase was 
governed by “one almost dominant gene’, plus several modifiers 
(Shull, 1944). 

Since Shull, then, had established that the size of the elytral macu- 
lations in this species was under genetic control, it was first necessary 
to examine the variation of the posterior total relative to the anterior 
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VARIATION IN NATURAL POPULATION 119 
total, and second to establish a method for separating the spotless and 
spotted components of the natural population at hand. It was not 
to be expected, on a priori grounds, that the area of overlap would 
be the same for a highly inbred laboratory stock and a natural popu- 
lation, because of the genetic drift which so often accompanies in- 
breeding. 


MATERIAL AND METHODS 


A hibernating population of 1,021 Hippodamia convergens beetles 
was collected in a clump of Calamovilfa grass in the foredune stage of 
the Ogden Dunes, Porter County, Indiana on October 26, 1946, at 
11:00 AM. The right elytra from 1,011 of these specimens were re- 
moved and affixed to slides, ten elytra being damaged in the preparation. 
The greatest diameter of each maculation was then measured micro- 
scopically in units ranging from 0 to 8. 

In accordance with Shull’s observations (1944) that the three 
posterior maculations appeared to act as a unit, totals of the greatest 
diameters of these spots for each elytron were made and their frequencies 
tabulated (table 1). The same was then done for the three anterior 
spots on 124 beetles chosen at random. The regression line of the 
anterior sums (y) on the posterior sums (x) for these 124 elytra was 
calculated by least squares: y = 2.17 + 0.3442 (fig. 1). The standard 
deviation of the slope was + 0.108. The coefficient of correlation was 
0.67, with 99 per cent fiducial limits at 0.78 and 0.52. 


TABLE 1. 
Frequencies of the totals of the three posterior maculations of the right elytra of 1,011 Hippodamia 
convergens 
Class Frequency Class Frequency 

0 39 ll 95 

1 10 12 123 

2 8 13 147 

3 9 14 131 

4 9 15 106 

5 13 16 59 
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It was evident that the posterior spots, as a group, tended to be 
larger than the anterior in the miixed sample (fig. 1). The positive 
y-intercept does not mean that individually the posterior three spots 
were most frequently absent. The frequency of absence of each spot, 
numbered anterior to posterior, was calculated and entered in table 2. 
Similarly, of the total occurrence of spot-absence of 381 cases, the 
individual spots accounted for the percentages listed in the last column 
of table 2. The posterior spots were absent more frequently than the 
anterior. These two findings agree with Shull’s observations cited 
earlier. 

With the exception of spot I, the posterior three maculations 
showed a greater frequency of absence and accounted for greater 
fractions of the total spot-absence in the mixed sample. Since, as a 
group, the posterior spots behaved as predicted by Shull’s observations, 
although spot I also exhibited a high tendency to be absent, the fol- 
lowing calculations were based on the posterior sums. Such procedure 
would enable direct comparison of the present results with those of 
Shull. 


ANTERIOR 


2 6 8 10 12 16 20 22 
POSTERIOR 
FIG. 1. 


Regression of the greatest-diameter sums of the three anterior on the three posterior maculations of 

124 right elytra selected at random from the natural population of 1,011 Hippodamia covergens beetlea. 

The number of elytra at any one point varies from one to eleven, and is represented by different sized 
circles. 
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TABLE 2. 


Absences of each elytral maculation, numbered anterior to posterior, in 124 Hippodamia convergens 
chosen at random from the natural population 


Cases of Absence 
Spot 
Per cent of Per cent of all 
Number possible spots absent 
I 98 79 26 
II 28 23 7 
Ill 35 28 9 
Anterior 161 43 42 
IV 52 42 14 
Vv 55 44 14 
VI 113 91 30 
Posterior 220 59 58 
Totals 381 100 


ESTIMATION OF POPULATION PARAMETERS 


Employing the data in table 1, a histogram was constructed from 
the frequencies of the sums of the greatest diameters of the three 
posterior maculations on 1,011 right elytra (fig. 2). Since the units of 
measurement used in this paper are twice those employed by Shull, 
the point on the abscissa which would separate the spotless and spotted 
beetles according to his method would be twice his separation point, 
that is, 2 X 7.75 = 15.50. From inspection of figure 2, it is seen that 
this value is much greater than it should be for the present population. 
There was only a slight probability that the separation points of the 
two populations would coincide, because the population at hand was a 
natural one, while Shull’s was a laboratory stock derived from a single 
pair of beetles. Such a cultured stock, especially when small, is highly 
susceptible to the effects of genetic drift, and as a result, one would 
expect a priori a shift in the separation point and mean of the spotted 
beetles. The standard deviation of the spotted population was probably 
smaller in Shull’s stock, since he had a separation point at 15.50, when 
the maximal possible value was 24.00 (in our units). Obviously, 
some method for separating the spotted and spotless phases other than 
Shull’s would have to be utilized in the present case. 


{ 


122 BIOMETRICS, JUNE 1953 


NUMBER OF RIGHT ELYTRA 
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SUM OF THREE POSTERIOR MACULATIONS 


FIG. 2. 


Histogram of frequencies of greatest-diameter sums of the three posterior maculations of 1,011 right 

elytra from a natural population of Hippodamia convergens. Mean (M) and standard deviation (S) 

calculated by truncating the histogram at xo ; the corresponding normal curve is imposed on the 

diagram. Area under the normal curve represents the 897 spotted beetles; remaining area to the left 
represents the 114 spotless beetles. 


The histogram in figure 2 is obviously bimodal, and the population 
to the right (the spotted beetles) seemed to describe a normal curve. 
It was reasoned therefore that the two populations could be separated 
best by cutting the curve at an arbitrary truncation point (z.), beyond 
which it would be highly improbable that any spotless beetles would 
fall, and then calculating the mean and standard deviation of the 
truncated normal curve. From these statistics, the size of the popu- 
lation under the truncated curve could be calculated. The estimation 
of the total numbers of spotted and spotless beetles from these data 
would then be a relatively simple step. The method used for working 
with such a truncated normal curve with unknown population size 
was that of Cohen (1949). His notation has been followed in the 
present paper with the exception that the primes have been omitted. 
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Fitting the Normal Curve 


The point of truncation was selected by inspection as z, = 9.5. 
This point is an arbitrary choice and might have been taken slightly 
lower. It was taken as the point above which, in the authors’ opinion, 
there was a very low probability that any spotless beetles would occur. 

The necessary summations are >, x = 3288.0 and >> x” = 17,205.5, 
where z is measured from xz). The expression 


1 (> 2)* 


was calculated as 0.30820, where n is the number of beetles (822) in 
the truncated sample. y, is needed only in estimating h and ¢, and is 
a moment function of h (the point of truncation measured in standard 
units; that is, h = (xz) — u«)/o, where uv and o are the mean and standard 
deviation respectively of the population). 

Cohen has plotted the relationship between y, and A, so that an 
initial estimate of h = —1.40 could be obtained from the graph. By 
successive approximations in the expression 


= - 2) 


h was calculated as —1.383 and Z as 0.16725. Z is defined by the 
identity 


Zh) = o(h)/To(h), 


where ¢(h) is the ordinate of the normal curve at ¢ = h, and J,(h) is 
the area under the curve to the right of the point t = h. 
The estimates of u and o are given by 


m= 2, — hs 


and 


n 


(Z — h) 


To obtain estimates of the standard deviations of m and s, ten 
independent random samples of 1011 elytra each were drawn from the 
total population of 1011 and separate estimates of m and s calculated 
for each sample. The standard deviations of m and s, each with 9 
degrees of freedom, were obtained from these 10 samples. The values 
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of m and s, with their standard deviations, are 
| 13.068 + 0.179 
2.580 + 0.081 


Having calculated the mean and standard deviation, the truncation 
point in standard units was given by h = —1.383. The area to the 
right of z» is calculated by 


where 


f() = var (Kenney, 1947, p. 116 ff.). 


From the table for the normal curve, 
s@-at = 0.4166 + 0.5000 = 0.9166. 
—-1.383 


The total number of beetles in the spotted population is calculated 
from 0.9166 N = 822. N = 896.8 or 897 and the number of beetles in 
the spotless population is 1011 — 897 = 114. The standard error of 
N is estimated by replicate calculations of N from the 10 random 
samples of 1011 elytra each; sy = 14. 

To impose the normal curve on the histogram in figure 2, values of 
x were substituted in the equation t = (x — m)/s and the corresponding 
standard-unit ordinates read from the table for the normal curve. 
These were converted in turn by substitution in the expression y = N- 
f(t)/s, where f(t) is the ordinate (Kenney, 1947, p. 125). The solutions 
for y are plotted in figure 2 against the appropriate values of x. 

The goodness of fit of the experimental data to the right of the 
point of truncation was tested by x” = 13.02, d.f. = 9, P = 0.17. 

At the suggestion of Wright (1952), a semilog parabola was fitted 
by least squares to the data after plotting the log frequencies in table 1 
against the classes. The goodness of fit was tested as before: x’ = 25.94, 
d.f. = 10, P < 0.01. Obviously, the normal curve proved a better 
fit to the data than the semilog parabola.* 


The Distribution of Spotless Beetles 


After removing the spotted beetles from the mixed population by 
either of the above two methods, the remaining spotless beetles assumed 


*Dr. G. E. P. Box has pointed out that, if the log frequencies are suitably weighted in a least squares 
analysis, this procedure should give improved estimates of the mean and variance which would yield a 
smaller x? for the goodness of fit test. 
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VARIATION IN NATURAL POPULATION 125 
a uniform distribution, except for a mode at class zero (fig. 2). This 
mode could have been produced by the necessarv truncation at this 
point. In contrast to Shull’s evidence (1944) that his laboratory 
population contained some four modifiers of the gene Spotless, Wright 
(1952) commented that the present distribution of spotless beetles 
would not be expected if variability depended on “some four more or 
less equally frequent and equally important modifiers. There is some 
suggestion of one main modifier or, perhaps more probably, an indication 
of such inequality in the physiological significance of scale units in 
this region that no interpretation is warranted.’ 


The Question of Gene Frequencies 


- Shull (1944) applied his criterion of the spotless phase to several 
widely distributed populations of Hippodamia convergens. Using the 
Hardy-Weinberg equilibrium formula*, which disregards systematic, 
random and nonrecurrent changes in gene frequency (Wright, 1949), 
the frequencies of the gene Spotless for each population were calculated. 
Further, assuming four partially dominant, equally abundant and 
effective modifiers of Spotless, the frequencies of the modifiers were 
calculated**. 

Similar computations with the present data yielded a frequency of 
Spotless of 1 — (897/1011)* = 5.81 per cent, which compares favorably 
with Shull’s values of 5.84 and 6.45 for two Michigan populations. 
However, such statistics are only rough approximations, because of the 
large phenotypic overlap, the disregard of such factors as mutation and 
selection pressures, and the indirect evidence relating to the inheritance 
of the spotless phase in nature. As was noted earlier, genetic drift is 
common in small laboratory stocks and, therefore, extrapolation to 
natural populations cannot be made with a high degree of confidence. 

On the other hand, genetic drift would not be expected a priori in 
natural populations of Hippodamia, according to the known ecology of 
the genus (Cutright, 1924; Park, 1930; Balduf, 1935). Although hiber- 
nation occurs in small restricted groups (Allee et al., 1949, p. 538), mating 
does not occur until these groups have dispersed in the spring; thus 
genetic drift would be held to a minimum and neighboring populations 
would not be expected to differ greatly in gene frequencies. 


* k 2 
| = «so , where q: is the gene frequency of the spotless gene S; . 
i=1 


**] — (Pe/P,)'/8, where Po is the number of beetles of class sero (table 1) and P, is the total number 
of spotless beetles. 
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SUMMARY 


A natural population of the lady beetle Hippodamia convergens 
Guer. was submitted to a statistical analysis of its elytral pattern in 
order to separate two overlapping phenotypes. These phenotypes were 
the so-called “spotless” and “spotted” phases, the inheritance of which 
had been investigated earlier by A. F. Shull. It was shown that the 
three posterior elytral spots, as a group, tended to be absent more 
frequently than the anterior three, and also tended to be larger in size. 
A normal curve was fitted to the arbitrarily truncated frequency distri- 
bution of the total of the greatest diameters of the three posterior 
maculations of spotted beetles. The normal curve gave an acceptable 
fit as tested by chi-square. From the statistics computed, the popula- 
tion sizes of the spotted and spotless beetles were calculated. The 
question of gene frequencies in this and other populations was discussed. 
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THE CHAIN BLOCK DESIGN 


W. J. Youpen anp W. S. Connor 
National Bureau of Standards, Washington 26, D. C. 


1. Introduction. The development of the design of experiments 
has been inspired largely by the needs of agriculture and biology. 
The first important steps were taken by R. A. Fisher and F. Yates, 
both of whom were on the staff of the Rothamsted Experiment Station. 
The methods devised by these workers are widely used in biology and 
agriculture and, to some extent, in other sciences. 

We believe that one of the reasons for the delay in the adoption 
of experimental designs in the physical sciences is that the classical 
designs often do not meet the experimental situations encountered in 
these sciences. In fact, when one compares the experimental conditions 
which exist in the biological and agricultural sciences with the ex- 
perimental conditions of the physical and chemical sciences, one is at 
once struck by certain fundamental differences. A difference which 
is of paramount importance is the magnitude of the errors of measure- 
ment in the two areas. In agricultural and biological experimentation 
the basic experimental material often is land or animals, and the varia- 
tion over~a field or between litters is likely to be large. To compensate 
for this large variation, the classical designs require repeated measure- 
ments to effect a reduction in the error of the estimates. 

Physical measurements can be made with high precision and the 
experimental material usually is relatively homogeneous. It follows 
that it is not necessary to put great reliance on replication in order to 
achieve good results. Excellent estimates may be obtained from one 
measurement, or at most, two or three. 

It is with the needs of the physical and chemical sciences in mind 
that we offer the Chain block design. This design is very flexible, 
and the number of replications is small. Thus, for a set of quantities 
to be compared, this new design calls for some of them to be measured 
once, and the others twice. 
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2. Definition of the Chain Block Design. Let us begin by considering 
a simple example of a Chain block design. Suppose 10 quantities are 
to be compared. Denote the quantities (traditionally, and hereafter, 


called treatments) by letters and arrange them in three blocks as fol- 
lows. 


BLOCK 

1 2 3 

A & E 
B D F 

& E A 

(2.1) 

a b c 

d 


We observe that the treatments which are denoted by capital letters 
are replicated twice each, whereas the treatments which are denoted 
by small letters are replicated only once each. Further, A never occurs 
in a block without B, nor C without D, nor E without F. Thus, the treat- 
ments which are replicated twice fall into 3 groups, and these groups 
are the links in the chain of blocks. Treatments C and D link blocks 
1 and 2, E and F link blocks 2 and 3, and A and B complete the chain 
by linking blocks 3 and 1. 

In general, there are v treatments which are arranged in b blocks. 
A treatment is said to belong to class C, if it is replicated once, or to 
class C, if it is replicated twice. There are v, treatments in C, and 
v, treatments in C,. 

The treatments of C, are divided into b groups of n,; treatments 
each, (j = 1, --- , 6), where a group is characterized by the property 
that the treatments of the j-th group occur in the j-th block, and no- 
where else. 

Similarly, the treatments of C, are divided into 6b groups of n, 
treatments each, (n2 > 1), where a group is characterized by the 
property that if a treatment of the group occurs in a block, then the 
other n, — 1 treatments of the group also occur in the block. Further, 
blocks j and j + 1 have in common the treatments of the j-th group, 
(j = 1, --- , b; mod b), but no other treatments in common. No other 
two blocks have any treatments in common. 

We shall denote the number of units in a block by k, and let 
>3.. k; = N. From the above, the following relations hold: 


ig 
{ 
= 
| 
aa 
Bt | 
> 
| 
| 


CHAIN BLOCK DESIGN 129 
v; + = 
Vv, + = N, 
(2.2) 
bnz = and 
If we let G,; , (@ = 1, 2;7 = 1,---, b). denote the treatments of 


the 7-th class which are in the j-th group, then we may display the 
Chain block design as follows: 


BLOCK 
& Gian 
& Ga (2.3) 
« & Gi 


Since the average number of replicates is (vy, + 2v,)/v, we may 
refer to the Chain block design as a ‘‘(v, + 2v,)/v replicate” design. 
Of course, 


1 < + 2,)/v < 2. (2.4) 


3. Some Practical Aspects of the Design. Perhaps the most striking 
aspect of the design is its flexibility. The numbers of treatments in 
C, and C, are largely at the disposal of the experimenter, as we shall 
see in the following example. Let b = 4 and v = 22, and let there be 
available only 10 plots per block. Then the following types of Chain 
block designs are possible. 


DESIGN 1: BLOCK 

1 2 3 4 

A ¢ E G 

B D F H 

C E G A 

D F H B (3.1) 
a m q t 

n r 

k 0 -9 


i 
: 
3 
i 
ib 
if, 
= 
{ 


130 BIOMETRICS, JUNE 1953 


In this design, v, = 14, v, = 8, N = 30, and there are 5 degrees of 
freedom for error. The number of degrees of freedom for error in the 
general case is given in Section 4. 


DESIGN 2: BLOCK 


(3.2) 


e883 AWE 
22 BDAY 
ee 


In this design v, = 10, v», = 12, N = 34, and there are 9 degrees of 
freedom for error. 


DESIGN 3: BLOCK 

1 2 3 4 
A E I M 
B F N 
C G K O 
D H L 
E I M A (3.3) 
F rf N B 
G K O +4 
H L D 
v 
t 


In this design v, = 6, v. = 16, N = 38, and there are 13 degrees of 
freedom for error. We have omitted one possibility, that for which 
v, = 18 and »v, = 4, since in that design there is only one degree of 
freedom for error. 

The choice of a design from among these three depends on several 
considerations. For one thing, it is clear that the estimates of the treat- 
ments of C, will have smaller variance than those of C, . Hence, the 


‘ 
i 
Le 
ae 
LE 
Ber 


CHAIN BLOCK DESIGN 131 


experimenter will have to decide whether there are approximately 
8, 12, or 16 treatments which should be estimated with the smaller — 
variance. Again, the experimenter must decide whether he needs 
5, 9, or 13 degrees of freedom for error, and this decision will depend 
in part on the magnitude of the variance of a single determination or 
yield. 

Although the definition of the Chain block design does not specify 
the distribution of the treatments of C, among the blocks, it seems 
desirable to distribute them as evenly as possible. For example, in 
Design 1 we have put either 3 or 4 treatments of C, in every block. 

Now consider the difference between the estimates of any two 
treatments. The variance of this difference will depend on where 
the treatments are in the design. For example, consider the treatments 
of C, in Design 1. If the treatments are in the same group, as A and 
B, the variance of the difference between their estimates is the smallest 
variance in the design. 

To explain this point further, let us think of the groups as arranged 
in a circle. Thus, around the circle we have groups !, 2, --- , b, withb 
followed by 1. Now consider any two groups of C, , say j andj’. Let 
the number of groups which lie between j and 7’, when counted in the 
shortest direction around the circle, be p. Then, the variance of the 
difference between the estimates of a treatment of j and one of 7’ varies 
directly with p. Thus, in Design 1, we have a larger variance for the 
difference between A and C than for A and B, and a still larger variance 
for the difference between A and EZ. Similar statements can be made 
for treatments in C, , or for one treatment from C, and the other from 
C,. The experimenter should, in so far as it is possible, be guided by 
these considerations when assigning treatments to groups. 

4. The Analysis in General. The analysis which we “shall give 
below can be rigorously justified, for example, by the theory of [1]. 

Let the typical yield be denoted by z;,;,. , where the first index 
refers to the class, the second to the group, the third to the treatment, 
and the fourth to the block. Thus, 7 = 1 or 2;7 = 1,--- ,b;u = 1, 

-,n;;andz=jorj +1. Let ¢,;, denote the effect of the u-th 
treatment of the j-th group of C; , and b, denote the effect of the z-th 
block. Then we assume that 


Liive = tiie + b, + €iiuy (4.1) 


where ¢,;;, is a random variable with mean, zero, and with variance, 
o. Further, we assume that ¢,;, is independent of the corresponding 
random variable which is associated with any other yield. 


Let the least squares estimate of a treatment or block effect be 


|: 
Wee 
lye 
| 


132 BIOMETRICS, JUNE 1953 


denoted by putting a circumflex (-) over the corresponding parameter. 
Thus ¢,;, is the estimate of ¢;;, and b, is the estimate of b, . 
Also, let 


u=1 
Then by imposing the restriction, 
b 
= 0, (4.2) 
z=1 


we may estimate the effect of the j-th block by 
(2nab)b; = (b — 2y — — (4.3) 


where a is the largest integer which is less than or equal to (b — 1)/2. 
In (4.3) and in the formulas below the subscripts should be reduced, 
mod b. 


The treatment estimates are easily found by using (4.3). Thus, 


and 
= + — b; = (4.5) 


To carry out the analysis of variance, we first compute the sum of 
squares due to error, which we shall denote by S’(e). For the j-th 
group of C, we form the differences, 


Now d;, is an estimate of the difference between the j-th and (j + 1)st 
blocks. Next compute 
= - ao (Ha) | an 
i 2 iu iu, 2 | 2 iu n, \& iu 


where 


The quantity Sj(e) is an estimate of the sum of squares due to error, 
based on (n, — 1) degrees of freedom. By computing a similar quantity 
for each group of C, , we obtain an estimate of the sum of squares due 
to error, based on b(n, — 1) degrees of freedom, 
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The remaining degree of freedom is given by 


Hence. 
Se) = + Y. 14.9) 


We now compute the sum of squares due to blocks, uncorrected 
for treatments, and the total sum of squares; and then obtain the sum 
of squares due to treatments by subtraction. If we let B; denote the 
sum of the yields in the j-th block and G denote the sum of all of the 


yields, then the sum of squares due to blocks uncorrected for treat- 
ments is 


S*(b) = Bi/k; — 4.10) 
and the total sum of squares is 
S(T) = vw. — 


where the summation is over all values of the indices. 
We obtain the sum of squares due to treatments by subtraction, thus: 


S(t) = S(T) — Se) — S*(b). (4.12) 
The analysis of variance table is as follows: 


ANALYSIS OF VARIANCE TABLE I 


Degrees Sum of Mean 
Due to of Freedom Squares Square 
Treatments 
(corrected for 
blocks) v-1 S*(t) s2(t) = S*(t)/(v — 1) 
Blocks 
(uncorrected) b-1 S3(b) = S3(b)/(b — 1) 
Error N-—-v—b+1] S*e) = S%(e)/(N —v — b 4-1) 
Total N-1 S*(T) 


We may be interested in the sum of squares for the treatments 
of C, alone. If so, let 


6 
Bay = + Xai: G; = Bai 
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and compute the quantities, 

sw =| - a]. (4.13) 
and 

SUT) = — G2/2nqb, (4.14) 


where the summation is over all values of the indices. Finally, the 
sum of squares due to treatments of C, , corrected for blocks, is 


= ST) — Se) — S2(d), (4.15) 
and the corresponding mean square is 
83(t) = S3(t)/(v2 — 1). (4.16) 


We may want the sum of squares due to blocks corrected for treat- 
ments. This quantity can be found as follows. We compute the 
quantities 


(D, D,-1), (4.17) 
(z = 1, --- , 6), and then find the sum of squares due to blocks cor- 
rected for treatments, S’(b)’, by 


S*(b)’ = > bP, . (4.18) 


The sum of squares due to treatments, uncorrected for blocks, 
S’(t)’, is found by subtraction, thus, 


Si)’ = S(T) — — (4.19) 
We now write out the analysis of variance table. 


ANALYSIS OF VARIANCE TABLE II 


Degrees of Sum of Mean 
Due to Freedom Squares Square 
Treatments 
(uncorrected) v-1 S*(t)’ s(t)’ = S%t)’/(v — 1) 
Blocks 
(corrected for . 
treatments) b-1 S*(b)’ 8*(b)’ = S*(b)’/(b — 1) 
Error N-v—b41 S*(e) s*(e) 
Total N-1 S*(T) 
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If we assume that the errors are normally distributed, we may 
carry out the usua! F and ¢ tests. It should be noticed that the ratio 
of s3(t) to s°(e) is with — 1) and (NV — v — b + J) degrees 
freedom. 

Let 


— 227 


=1+ 
Then for any two treatments of C, , say t;, and toy , 

and for two treatments of C, , say t,;, and fiw , 


— few) = + (1 + (4.2! 


where 7 = j — s, mod b. For a treatment of C, and a treatment 
C, say and ’ 


head 1(3 + j ==, or 
1 — 4i’ 


where 7’ = j — s, mod (b + 1). 

5. An Example. It is characteristic of many experimental situations 
in the physical sciences that the block is sharply defined. This con- 
trasts with the arbitrary designation of a given land area as a block in 
agricultural field trials. For example, spectrographic determinations 
of the chemical elements may be carried out by the comparison of 
spectrum lines as recorded on a photographic plate. A limited number 
of exposures may be made on « plate which is then developed. Ob- 
viously, all determinations on one piate experience the same processing, 
and comparisons within a plate have been demonstrated to be more 
precise than comparisons between samples run on different plates 
When the number of samples exceeds the capacity of a plate an oppor- 
tunity is presented to use an arrangement to correct for block effects 
and to do this with a minimum amount of replication. The example 
chosen concerns a study of the nickel content of a large number of 
rods prepared from the same ingot. The study was made by BK. F 
Scribner of the Spectrochemistry Section of the National Bureau »v. 
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Standards. It is desired to detect possible differences in composition 
The rods were made from a selected portion of the ingot to insure a 
high degree of uniformity among them. 

In the example, b = 3, v, = 30 and v, = 12. a the design 
is a “1? replicate’’. 

We shall let (¢ = 1, 2;7 = 1,2,38;u = 1, , = 10 
or 4 according as 7 = 1 or 2) denote the u-th treatment of the j-th 
group of the i-th class. Then the treatments occur in the blocks as 
follows: 


BLOCK 
1 2 3 
231 211 221 
232 212 222 
233 213 223 
234 214 224 
211 221 231 
212 222 232 
213 223 233 (5.1) 
214 224 234 
111 121 131 
11(10) 12(10) 13(10) 


The amounts of nickel in the rods were recorded as logarithms to 
the base 10 of the ratio of the intensity of the nickel spectral line to 
the iron spectral line. For our purposes, these determinations have 
been coded by multiplying by 10°, and then subtracting 170. We 
give below the coded amounts, and their corresponding symbols. 
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Symbol Amount Symbol Amount Symbol Amount 
Imi = 8 Iniz = 4 = 
Tom = 7 tno = 3 2s = 0 
= 14 Lan = 10 = 
= 9 = 6 = —8 
Zan = 13 Zona = 5 Yas = 1 
Zan = 15 = 7 = 5 
Tas = 12 = 2 = 2 
laa = 9 = 6 tous = (5.2) 
Zin = 1l = 10 = 5 
Tun = 5 = 9 = —1 
Zia = 17 = 6 = —3 
Lia = 14 Ling = 7 Liss = —6 
Lusi = 12 Liana = 6 Zisss = 2 
Zio = 13 = 4 = —2 
= 14 = 7 = —2 
Lun = 12 = 7 Ziszss = O 
= 8 = 9 = 

= 21 = 10 = 2 


We shall follow the method of analysis which was described above. 
Certain preliminary computations will be helpful. Thus, we obtain 
the following values for certain symbols which were defined in section 4: 


= 38 = 23 = —12 
Xa = 49 Xazs = 20 Xsss = 8; (5.3) 


D, = 26, D,=32, Ds, = —30; (5.4) 
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di. = 12 dy. = dy, = —2 

di, = 2 dys = 5 = —12 

dy, =3 dy=14 dy = —9 (5.5) 
d,, = 26 d,, = 32 = —30 
di, = 238 di. = 306 = 278; 


3 
Ba = 87, B. = 43, B,, = —4, G, = 126, > Bi; = 9,434; (5.6) 


i=l 


& 

> 


3 
= 118, B, = —8, G = 324, > B? 


59,784; (5.7) 


Dd zine = 1,388, > 23... = 3,882, (5.8) 


where the summations are over all values of the indices. 
We estimate the effect of the first block by (4.3) and (5.4), thus, 


246, = 2(56) = 112, 
6, = 14/3. (5.9) 
Similarly, 6, = 1/2 and 6,.= —31/6. The sum of these estimates 


is zero, as it should be. 
The estimate of the effect of ¢,,, is found from (4.4) and (5.2) to be 


tu = 11 — 14/3 = 19/3, (5.10) 
and the estimate of the effect of ¢,,, is found from (4.5) and (5.2) to be 
tour = $(13 + 4 — 14/3 — 4) = 71/12. (5.11) 


Other treatment effects are estimated similarly. The sum of these 
estimates is 261, ie., G — 3G, . 

To estimate the sum of squares due to error we find from (4.7) 
and (5.5) that 


= $[238 — 4(26)*] = 34.50, (5.12) 
Si(e) = 25.00, and S3(e) = 26.50; and from (4.8) and (5.3) 


vo si (77 — 49)? = 32.67, (5.13) 
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so that by (4.9), 
S*(e) = 118.67. (5.14) 


From (4.10) and (5.7), the sum of squares due to blocks is 


S*(b) = 7g (59,784) — =; (824) 
= 3,321.33 — 1,944.00 (5.15) 
= 1377.33, 


and from (4.11), (5.7), and (5.8), the total sum of squares is 
S(T) = 3,862 — or (324)? = 1,918.00. (5.16) 


Hence, by (4.12), (5.14), (5.15), and (5.16) the sum of squares due to 
treatments is 


S*(t) = 1,918.00 — 118.67 — 1,377.33 = 422.00. (5.17) 


The analysis of Variance Table I is as follows: 


ANALYSIS OF VARIANCE TABLE I 


Degrees of Sum of Mean 
Due to Freedom Squares Square 
Treatments (corrected for blocks) 41 422.00 10.29 
Blocks (uncorrected) 2 1377 .33 688 .66 
Error 10 118.67 11.87 
Total 53 1918.00 


By (4.13) and (5.6), 


$46) = (9,484) — (126)" 
= 1,179.25 — 661.50 (5.18) 
= 517.75, 


and, by (4.14), (5.6), and (5.8), 
Si(T) = 1,388.00 — (126) 


= 726.50. (5.19) 
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Hence, by (4.15), (5.14), (5.18), and (5.19), the sum of squares due to 
treatments of C, , corrected for blocks, is 


S3(t) = 726.50 — 118.67 — 517.75 


= 90.08, (5.20) 
and by (4.16) the corresponding mean square is 
33(2) = 90.08/11 = 8.19. (5.15) 
From (4.17), and (5.4), 
P, = 3(26 + 30) = 28.00, (5.16) 


P, = 3.00, and P; = —31.00. Thus by (4.18) and (5.9), the sum of 
squares due to blocks, corrected for treatments, is 


S*(b)’ = 292.33, (5.17) 


and by (4.19), (5.14), (5.16), and (5.17), the sum of squares due to 
treatments, uncorrected for blocks, is 


S*(t)’ = 1,918.00 — 118.67 — 292.33 
= 1,507.00. (5.18) 


If the reader desires to do so, he may write out the Analysis of 
Variance Table II. The F ratio for blocks, corrected for treatments, is 


F = 12.31, (5.20) 


which is significant at the .01 level of significance. Thus, we have 
been wise to remove the error due to blocks. 

Since the treatment mean squares are not significant and in fact, 
are less than the error mean square, we might decide not to carry out 
t tests. However, for illustrative purposes, we shall compare the 
effects of 213 and 224. From (4.20), we find that the variance of the 
difference, (ts:; — tax), is (13/12)c°. Hence, using our estimate of 
that is, s*(e), we obtain 


lois — toms | 3.17 (—3.92) | 
3.731 = 1.00, (6.21) 


which is not significant. 
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DESIGN AND ANALYSIS OF TRIANGULAR SINGLY 
LINKED BLOCKS 


K. R. Narr* 


Institute of Statistics 
The Consolidated University of North Carolina 


Design. 

Recently, the author (1950) called attention to the importance of 
enumerating partially balanced incomplete block (p.b.i.b.) designs 
involving as few as two replications of each treatment. In that and a 
subsequent note (Nair, 1951) several examples of p.b.i.b. designs 
involving only two replications and having either 2, 3 or 4 associate 
classes were given. Bose (1951) made an exhaustive study of two- 
replicate p.b.i.b. designs having 2 associate classes. 

One of the two-replicate p.b.i.b. designs given in the author’s (1950) 
paper consisted of a design having the parameters: 


v=(1/2)pp—-1) k=(@-1) r=2 db=p 
1 = 0 

2p — 2) m = 1/2(p — 2)(p — 3) 

7 

— 4) 7 


4) — 4)(p — 5)J 


This design is constructed by dualising the unreduced balanced 
incomplete block design in blocks of 2 plots for which the parameters 
v*, k*, r*, b* and A* are: 


=p, k= 2 =(p—1), 


Pie = 


Pio = 


*Address: Forest Research Institute, Dehra Dun, India. 
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The dual design » = b*, k = r*, r = k*, b = v* has the property 
that there is just one treatment common to every pair of blocks. It 
therefore belongs to the class of designs which Youden (1951) has 
recently used under the name Singly Linked Blocks. 

Bose (1951) has shown that it is easy to write down the blocks of 
the design by using the scheme given below, illustrated for the case 
p = 5. The blocks are given by the rows (or columns). 


3.4 
368 * 10 
° 
SCHEME 1 


Thus, the five blocks of the design are: 


Block Treatment 
Number Number 
(1) 1 2 3 4 
(2) 1 5 6 7 
(3) 2 5 8 9 
(4) "ie 6 8 i0 
(5) 4 7 9 10 


It is interesting to note that the design can also be constructed from a 
simple geometrical configuration. This method consists in drawing p 
straight lines in such a way that they cut each other. There will be 
(p — 1) points of intersection on each line and the total number of such 
points in the configuration will be (1/2)p(p — 1). By considering the 
lines and points as analogous to blocks and treatments respectively, we 
obtain the design. 
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The following diagram illustrates the method for p = 5. 
1 


4 7 


FIGURE 1. 


When p is odd, the configuration can alternatively be presented 
artistically (like a star), as illustrated below for p = 5. 


1 


FIGURE 2. 


One of the advantages of the scheme devised by Bose is that, in one 
stroke, it gives a simple method of constructing the design as well as a 
simple rule for picking out the first and second associates of each treat- 
ment, Thus, in Scheme 1, all treatments whose numbers appear in the 
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same row and column are first associates of each other. For instance, 
the Ist associates of treatment no. 1 are 2, 3, 4, 5, 6 and 7, while its 
second associates are 8, 9 and 10. 

There is another way in which the association scheme of the design 
can be presented. Let us denote each treatment by two numbers 
(t, 7) where 7 takes the values 1, 2, --- (p —1) andj takes the values 
t+ 1,71+ 2, ---, p, fora particular value of i. For instance the treat- 
ments 1, 2, --- 10 of Scheme 1 may be written as follows: 


(12) (13) (14) (15) 
(23) (24) (25) 
(34) (35) 


(45) 
SCHEME 2 


Treatment (77) occurs in the ith and jth blocks of the design. Hence, 
treatments with one digit in common are Ist associates and treatments 
with no digit in common are 2nd associates. 

Bose and Shimamoto (1952) have recently found that the associa- 
tion scheme of a number of p.b.i.b. designs having 2 associate classes, 
for which v = 1/2 p(p —1) but & not necessarily equal to (p —1), can 
be represented in the same way as for the design: v = 1/2 p(p —1), 
k = (p —1), illustrated in Scheme 1 for p = 5. They have given this 
type of p.b.i.b. designs the name: Triangular Type. In particular, the 
design for v = 1/2 p(p —1) and k = (p —1) has been classified by them 
under the name Triangular Singly Linked Blocks (TSLB). 

It is fairly obvious that Scheme 2 can be used to represent the 
association scheme of any triangular type design. 

If the basic TSLB design involving only two replications of each 
treatment is repeated, say, / times in the experimental lay-out, so that 
there are / basic groups each having p blocks of (p —1) plots, the result- 
ing design may be considered as a partly resolvable p.b.i.b. design 
having the following parameters: 


v = (1/2)p~—1), Kk=(P—-1), r=2l, Y=1, =0 


and n, , N , Py, remaining the same as in the basic TSLB design. 
The method of analysis for both the cases will be discussed in the 
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next two sections and a numerical example for the analysis of the basic 
design given in the last section of the paper. 


Analysis of the Basic TSLB Design. 


The statistical analysis of the data of an experiment laid out using 
triangular singly linked blocks can be performed by direct substitution 
in the general formulae for p.b.i.b. designs having 2 associate classes 
(See Nair, 1952). By using association Scheme 2, the resulting ex- 
pressions for estimates of treatment effects, their variances and the 
components:-of the analysis of variance can be very much simplified. 

The values of the quantitative character (x) under study for the 
two replications of treatment (77) will, for the convenience of analysis, 
be denoted by z;; for the plot belonging to the ith block and by z;,; for 
the plot belonging to the jth block. The whole data can then be pre- 
sented as in Table 1 below. 


Block Block 
Number Total 
(h) 
| 
| * 
(2) T2.p-1 Ly 
(p — 1) Tp-1,p| T(p-1). 
( * 
Pp Xp3 Lp, p-1 Lp. 
Total iz, Ss | = Grand Total 
TABLE 1 
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Let T;; denote the total value of xz for treatment (ij). Then 7;; = 
z;; + x;,. Values of T;; may be presented as in Table 2. 


Tos Tx 
Tp-1.p 
TABLE 2 


The general formula (37) for estimating the effect of a treatment, 
with recovery of inter-block information, given in the author’s (1952) 
paper simplifies for this particular design to the form 


where, 


(w — w’) 
pw + (p — 2)w’ @) 


The adjusted treatment total with recovery of inter-block informa- 
tion is therefore, 


— — 2.) + (2;. — 2.,)} (3) 


Dividing by 2, we get the corresponding adjusted treatment mean. 

To get the corresponding intra-block estimates, we have only to 
substitute w’ = 0 or » = 1/p in (1) and (8). 

To estimate w and w’ and from them the value of » we have to 
perform the analysis of variance on the data of Table 1. 

The total sum of squares, the tredtment sum of squares (unadjusted 
for block effects) and the block sum of squares (unadjusted for treatment 
effects) are calculated in the usual way. They are: 


Total SS. = 2, (x5; Zia) 1) (4) 
Block S.S. (unadj.) = 5 — (5) 


Treatment S.S. (unadj.) = (1/2) (6) 
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It is easier in this design to first calculate the sum of scuares for 
blocks (adjusted for treatment effects) than to calculate the sum of 


squares for treatments (adjusted for block effects). 


given by the expression 


Block (adj.) = (es. 24)" 


The former is 


(7) 


The latter is then calculated by subtracting (5) from the total of (6) 


and (7). 


The analysis of variance is presented in Table 3. 


TABLE 3. ANALYSIS OF VARIANCE 


Source of Variation | Degrees of Freedom| Sum of Squares Mean Square 
Blocks (adj.) (p — 1) See formula (7) EB, 
Treatments (unadj.) |(1/2)(p + 1)(p — 2)| See formula (6) 

Intra-block error (1/2)(p — 1)(p — 2)| (4) — (6) — (7) E, 

Total y-e-) See formula (4) 
Blocks (unadj.) (p — 1) See formula (5) 
Treatments (adj.) |(1/2)(p + 1)(p — 2)| (6) + (7) — (5) E, 


The variance ratio F = E,/E, provides a test of significance of inter- 


block estimates of treatment effects. 


The author (1944) had shown that for a non-resolvable incomplete 
block design (such as this) the estimates of w and w’ are given by 


w= 


1 


v(r — 1) 


U 


~ — DE, — — KE, 


(8) 


Substituting v = (1/2) p(p —1), k = (p —1), r = 2, b = p, we get 


P 
— DE, — @ — DE. 
Hence, estimate of » follows, 
E, E, 
ma (10) 


For calculating the lowest significant differences among the values 
of ¢;; given by (1) we have to calculate the variance of the difference 
between every pair of them. These pairs fall into two groups, namely, 
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those pairs which occurred together in the same block (i.e., those treat- 
ments with one digit common) and those pairs which did not occur 
together in the same block (i.e., those treatments with no digit common). 

(i) Variance of the difference between two treatments with one 
digit common can be obtained by direct substitution in formula (39) 
of the author’s (1952) paper. It simplifies to the form 


1 
pat» (11) 


(ii) Variance of the difference between two treatments with no digit 
common can be obtained by substitution in formula (40) of the author’s 
paper. It simplifies to the form 


1 
+ 2n) (12) 


(iii) The mean variance for differences for all pairs of treatments is 
derivable from formula (41) and simplifies to the form 


1), + (13) 


By substituting u = 1/p in (11), (12) and (13) we get the correspond- 
ing variances for intra-block estimates of treatment effects. 


Analysis when the Basic TSLB Design is Repeated Several Times. 


Let the basic TSLB design be repeated | times in | compact groups 
of p blocks each in the field, the p blocks of each group being inde- 
pendently randomized. In every basic group the blocks will be numbered 
in such a way that block no. (h) will be the one containing the (p —1) 
treatments (1, h), (2, h), (kh —1, A), (h,h + 1), (A, p). These 
treatments will be randomized within this block. 

Let 2;;, and z;,, be the values for treatment (77) in the plots allotted 
to it in block numbers (7) and (7) of the kth basic group. Let 


1 
kel k=) 


We have to set up a table of values of z,;;. and z;;. similar to Table 
1 and calculate the marginal totals z,,. and z.,. ; and the grand total 
z,... The total for treatment (77) will be = + 2;;. . 

The total for block no. (h) of the kth basic group will be z,., and the 
total for the group will be z.., . 
Estimate of effect of treatment (ij) with recovery of inter-block 
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information is 
= 21 u{(z;.. + z.;.)} (p — (14) 


where » has the same value as in (2). 
The corresponding adjusted treatment total is 


> 2.4.) + — (15) 


Dividing (15) by 2/ we get the corresponding adjusted treatment mean. 

To obtain intra-block estimates we have to replace u by 1/p. 

In the analysis of variance the (lp —1) degrees of freedom for 
blocks can be split up into three components of (1 —1), (p —1) and 
(l —1) (p —1) degrees of freedom. The first and third components are 
orthogonal to treatment ee Their sums of squares are 


pp 2.) (16) 


The expectation of (16) involves w, w’ and the inter-group variance 
between blocks of size p(p —1). It is therefore not available in esti- 
mating w’. 

The expectation of (17) is (l —1) (p —1)/w’. 

The second component is non-orthogonal with treatment com- 
parisons. Its sum of squares after eliminating treatment effects is 


Bip — (18) 


Using the results of Table VIIB of Bose and Shimamoto’s paper 
we can show that the expectation of (18) is 1/2[(p)/w’ + (p — 2)/w). 

To get the best estimate of w’ we should combine (17) and (18). 
This has been done in the analysis of variance shown in Table 4. 

Estimates of w, w’ and yu in terms of FE, and EL, are given below 


w= (19) 
’ 2l(p — 1) — (p — 2) 


+ — DE. 


‘ 
f 
= 
: 
: 
(E, — E.) . 
(21) 
: 


(fps) 


3 (-@O+@ Dt + O2/1 (3) 
E td 1 (I d) (‘fpeun) 
= (I - — + I-(—@q 1330], (9) 
(eT 
(fps) 
— x) (I — sdnoi oweq 
(4) 
"2 <) 1-2 sdnoi3 oweg (8) 
erenbg uvayy serenbg jo ung 
GONVIUVA dO SISATVNV ATAVL 
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To calculate the variances of differences between adjusted treatment 
means with recovery of inter-block information we have to divide (11), 
(12) and (13) by J. 


Numerical Example. 


The data used in this example were made available to me by Dr. 
W. J. Youden of the Statistical Engineering Laboratory, National 
Bureau of Standards, Washington, D.C. from a Technical Report 


‘ (unpublished) by John E. McKinney, George E. Decker and Frank L. 


Roth of the Bureau’s Polymer Development branch, Research and 
Development Division. 

Physical properties of sampies from 10 bales of a particular brand 
of synthetic rubber were measuicd in order to determine values of the 
properties for the lot and to measure the uniformity of the materia! 
throughout the lot. 

Two compounded batches were prepared from samples of each of 
the 10 bales. Since there was not sufficient material from any one 
bale to make more than two tests and since it was not possible to test 
all the bales in a single day, the scheme in Table 5 was used. 


TABLE 5. ALLOCATION OF BALES 


Day | Baie Number 
(1) | 1 | 2 | 3 4 
(3) 2 5 | 8 9 
(4) 3 6 8 10 
(5) 4 7 9 10 


It will at once be noticed that if ‘days’ and ‘bales’ denote ‘blocks’ 
and ‘treatments’ respectively, the scheme in Table 5 is identical to 
Scheme 1 for linked block designs for the case p = 5 discussed earlier. 

Table 6 shows the values for one of the properties, namely, strain 
at 400 psi. Each value is the median for 3 specimens from a vulcanized 
sheet. 
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TABLE 6. DATA 


Day Strain (%)* at 400 psi 
(1) 35 20 13 25 
(2) 16 16 21 27 
(3) 10 5 20 15 
(4) 26 24 37 31 
(5) 21 16 20 17 


Using Scheme 2 for numbering the bales and Table 1 for presentation 
of the data we shall rearrange the data of Table 6 as shown in Table 7. 


TABLE 7. RE-ARRANGED DATA WITH MARGINAL CALCULATIONS 


Total 
| (a) | — Za | — Zr) 


= 
é 
152 
het 
Le *Coded—300 
| 
Block 
4 
No. 
(1) 
ai i 35 | 20 | 13 | 25 | 93 | 73 +20 +2.94 
(2) 
Le 16 16 | 21 | 27 | 80 | 80 0 0 
(3) 
10 5 20 15 50 93 —43 
ta | ta | Za 
(5) 
sh ee 21 | 16 | 20 | 17 74 | 98 | — 24 —3.53 
Total 
oll 
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The treatment totals are tabulated below: 
(12) (13) (14) (15) 


51 30 39 46 
(23) (24) (25) 


21 45 48 
(34) (35) 
57 35 
(45) 
48 
(415)? 
(a) Total sum of squares = (35)? + --- + (17)? — — 1207.75 
(b) Block sum of squares _ (93)* + --- +(74)’ — (415)? _ 626.00 
(c) Block sum of squares = 
(d) Treatment sum of _ (51)? + --- + (48)? — (415)? _ 504.25 
squares (unadjusted) 2 20 
(e) Treatment sum of 
squares (adjusted) 
(f) Intra-block error 
sum of squares = (a) —- ©) —- @ = (a) — (6) — €) 


= 200.10 


These sums of squares can now be presented in the following table 
of analysis of variance. 


Source of variation Degrees of Sum of Squares | Mean Square 
Freedom 

Blocks (adj.) 4 503 . 40 125.85 = E, 
Treatments (unadj.) 9 504.25 

Intra-block error 6 200.10 33.35 = E, 
Total 19 1207.75 
Blocks (unadj.) 4 626 .00 

Treatments (adj.) . 9 381.65 42.41 = E, 
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The variance ratio F = E,/E, = 1.27 is not significant. 
Estimate of yu is 


Pee E, — E, = 0.1470 


Each value of z in the Ath block may now be adjusted by subtracting 
from it either u(x,, — x.,) or 1/p (x,. — z.,) according as the adjustment 
desired is with or without recovery of inter-block information. The 
adjusted values for the former case are given below. 


Block Number 
(1) * 32.06 17.06 10.06 22.06 
(2) 16.00 * 16.00 21.00 | 27.00 
(3) 16.32 11.32 . 26.32 | 21.32 
(4) 19.09 17.09 30.09 * | 24.09 
(5) 24.53 19.53 23.53 20.53 | * 


The corresponding adjusted treatment totals are: 
(12) (18) (18) 


48.06 33.38 29.15 46.59 
(23) (24) (25) 


27.32 38.09 46.53 


(34) (35) 

56.41 44.85 
(45) 
44.62 


The corresponding adjusted treatment means are: 
(12) (18) (14) (15) 


24.0 16.7 14.6 23.3 
(28) (24) (25) 
13.7. 19.0 23.3 
(34) (35) 


28.2 22.4 
(45) 


22.3 
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In these triangular tables the easiest way to spot 1st associates is to 
remember that any two treatments with one digit in common would 
have occurred together in the same block e.g. treatments (12) and (25) 
occur together in block no. (2). Those with no digits common would 
not have occurred in the same block and are 2nd associates e.g. treat- 
ments (12) and (34). 


(a) 


(b) 


(c) 


To 


The variance of difference between means, adjusted with 
recovery of inter-block information, of two treatments occurring 
together in the same block (1st associates) is given by 


“ (1 +») = 33.35 X 1.147 = 38.25245 


The lowest significant difference between adjusted means at 
5 and 1% levels are 

LS.D..o5 = 2.447 X V38.25245 = 15.13 

LS.D..o, = 3.707 X V38.25245 = 22.93 


The variance of difference between means, similarly adjusted, 
of two treatments not occurring together in the same block 
(2nd associates) is given by 


-( + 2u) = 33.35 X 1.294 = 43.15490 


The lowest significant difference between adjusted means at 
5 and 1% levels are 

LS.D..o; = 2.447 X V43.15490 = 16.07 

L3S.D..« = 3.707 x Vv 43.15490 = 24.35 
The mean variance for differences for all pairs of adjusted 
treatment means is 


4( 4 4.) = 33.35 X 1.196 = 39.88660 


If we are thinking of using one common L.S.D. for every pair 
of adjusted means, its value for 5 and 1% levels are 

LS.D..os = 2.447 X V/39.88660 = 15.46 

LS.D..o. = 3.707 X V/39.88660 = 23.41 


get corresponding values of L.S.D. for intra-block adjusted 


treatment means we have only to replace u by 1/5. These L.S.D. values 
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are strictly valid, but not the L.S.D. values calculated above for means 
adjusted with recovery of inter-block information owing to the fact 
that u is estimated from the data and not known a priori. 

Finally, I wish to express my thanks to Professor Gertrude M. Cox 
for her continued interest in this work which was done at the Institute 
of Statistics, The Consolidated University of North Carolina, Raleigh, 
during the tenure of a joint Fulbright and Smith-Mundt Fellowship 
awarded to the author by the U.S. Government. My thanks are also 
due to Dr. W. J. Youden for supplying me the experimental data for 
the numerical example. 
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SPLIT-PLOT HALF-PLAID SQUARES FOR IRRIGATION 
EXPERIMENTS’ 


Water C. Jacos? 


The rapid developments in the field of experimental design have 
provided many designs which appear to be extremely complicated. 
It seems desirable to publish examples of the use of various designs 
so that other research men will have the benefit of knowing how the 
design actually worked in practice. It is the purpose of this paper to 
present an example of the use of a complex design in vegetable crops 
research. The reasons for the choice of the design and the various 
steps involved in the analysis and interpretation of the data are given 
in sufficient detail so that the non-statisticians will be able to utilize 
such a design. 

One of the major projects of the Long Island Vegetable Research 
Farm is the determination of the fertilizer requirements of the various 
vegetable crops including potatoes. Recommendations have been made 
for fertilizing potatoes, but the advent of extensive irrigation of potatoes 
on Long Island raised the question as to how much difference there 
would be in fertilizer requirements of potatoes under irrigated con- 
ditions compared to potatoes grown without irrigation. Many new 
varieties of potatoes have been recently introduced and preliminary 
information indicated that some of the newer varieties had fertilizer 
requirements which differed considerably from the requirements of the 
standard varieties in use on Long Island. 

The specific problem for which this experiment must furnish an 
answer could be stated as follows: 

“What influence does irrigation have on the nitrogen, phosphorus, 
and potash requirements of different varieties of potatoes grown under 
Long Island conditions?” Thus there were five factors to be investi- 
gated, namely irrigation, nitrogen, phosphorus, potash and variety. 


1Paper No. 310, Dept. of Vegetable Crops, Cornell University, Ithaca, New York. 
*Profeasor of Vegetable Crops, Cornell University, Ithaca, New York. 
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SELECTION OF THE DESIGN 


The area of land available for this experiment consisted of two 
large blocks each 214 feet by 260 feet in size. Irrigation was equally 
available over the whole area with the one restriction that all rows must 
parallel the irrigation lines and thus must run the 260 foot length of 
each block. Irrigation plots require large isolation borders and, since 
there was little interest in this experiment in testing the benefits of 
irrigation, it was decided to divide each block in half, one half to be 
irrigated and the other not. In this way any effect of irrigation would 
not be confounded with blocks and the estimation of fertilizer and 
variety interactions with irrigation could be viewed with more con- 
fidence. These whole plots consisted of 36 rows of potatoes each about 
260 feet long. Four rows were left between plots for irrigation border 
areas. 

Previous work (2)* has indicated the desirability of having at least 
one guard row on each side of a fertilizer plot to eliminate border effect. 
Thus two rows of each plot will have to be discarded and a minimum 
size plot would be three rows wide. To utilize the experimental area 
more efficiently and to provide some estimate of within plot variability, 
plots 4 rows in width were chosen as the smallest practical size. Using 
three levels each of nitrogen, phosphorus, and potash and three varieties, 
there were 81 treatments to be applied to each whole plot. Each whole 
plot had 9 sub-plots in the 36 row width and thus for 81 treatments 
each whole plot would have 9 plots down the length of the potato rows. 
Since each whole plot was over half an acre in size it was considered 
advisable to select an arrangement of the sub-plots so that soil variability 
within each whole plot could be removed at least partially from the 
estimate of error. Yates (5) has constructed a number of confounded 
arrangements in Latin squares. By confounding some of the inter- 
actions with the rows and columns of the squares, differences among the 
rows and columns can be eliminated from the experimental error. The 
81 treatments can be arranged in a 9 x 9 quasi-Latin square by con- 
founding portions of second and third order interactions. The potato 
rows corresponded in direction with the rows of the Latin square. Each 
plot was 28 feet long. One of the factors involved was variety and it 
was impractical to change variety at planting time for each plot. By 
applying only one variety to each row of the square we have the design 
described by Yates (5) as a half-plaid Latin square. Although this 
feature of the design was desirable from the standpoint of practicability 
in field operation it should be pointed out that considerable precision 


*Numbers in parentheses refer to Literature Cited at end of paper. 
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was lost in the varietal comparisons. The comparison of varieties was 
not as important as some of the interactions between varieties and 
fertilizer factors so the loss in this comparison was felt to be justified by 
the increased precision available for the interactions and the saving in 
time and labor in installation of the experiment. 

Figure 1 shows the construction form of the basic square before 
randomization (c. f. Yates (5)). This basic square was used in preparing 
the field layout for all four whole plots. Each plot was obtained from 
this basic square by a separate randomization of the rows and columns. 


FIGURE 1. THE 9 X 9 HALF-PLAID LATIN SQUARE BEFORE RANDOMIZATION ' 


NPK “W” Confounded 


lll 213 312 122 221 323 133 232 331 
313 112 211 321 123 222 332 131 233 
212 311 113 223 322 121 231 333 132 


NPK “X”’ Confounded 
~ 

8 

8 


Letters represent the three varieties. 
Numbers represent levels of N, P, and K respectively. 


Figure 2 shows the final field plan. The letters A, B, and C represent 
the three varieties which apply to the whole row of the square while 
the groups of three numbers indicate the levels of nitrogen, phosphorous, 
and potash, respectively. To confound interactions of factors having 
three levels each Yates (5) has prepared a formal subdivision of the 
degrees of freedom. The portion of the N P K interaction designated 
as “W” has been confounded with columns of the square and the “X” 
component has been confounded with rows. Thus those components of 
the NP K V and NP KI and NP K VI have also been confounded 
with rows or columns of the squares because of the split-plot half-plaid 
features. The actual planting was done with a special planter to be 
described in a later publication. The yields from each of the two middle 
rows for each of the 24 feet long plots were recorded separately. This 
gave information concerning the within plot variability or sampling 
error. This procedure would not always be necessary but sometimes 
knowledge of the sampling error is desirable for use in future work in 
planning experimental designs. 
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T€Z | | | | S88 | SIS | SE | | ITE ad | | | SIT | TES | SIS | | ISI | SET q 
€2E | | 12% | SES | SSI | | IIT | | a | 12% | 28S | SIE | SET | IIT | | | re) 
| ILE | | | | SET | SIZ | | | SIZ | | SEE | | | SIT | ITE | Vv 
| | | | | TIL | [2% | | | ZTE | | | 122 | | | V 
| SET | IIT | | STE | 122 | |. | EZE V 22S | SES | 112 | | SIT | | Tel | Zee | gq 
| | ES | TIS | | SIE | | | SIT | | | | | SE | See | | SIZ 
ITE | | SIZ | | EIT | SE | ZEIT | | zee | | | | 22S | SES | | SIT | 
GIT | | STE | | | | See TEI | a ZIT | | Tel | | | STE | IZE | | SES V 
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TABLE 2. YIELDS OF U.S. NO. 1 TUBERS TABULATED BY TREATMENTS 


Irrigated 
A B 94 
NPK 

1 2 Total 1 2 Total 1 2 Total 

lll 43 68 lll 65 58 123 56 46 102 
112 74 63 137 62 82 144 50 62 112 
113 64 71 135 59 39 98 56 44 100 
121 74 88 162 52 64 116 70 65 135 
122 76 &3 129 58 44 102 66 39 105 
123 53 70 122 65 34 99 34 50 84 
131 64 58 122 64 37 101 68 56 124 
132 69 85 154 32 61 93 67 69 136 
133 43 86 129 60 40 100 62 76 138 
211 73 89 162 74 68 142 52 44 96 
212 81 68 149 ff 48 125 64 89 153 
213 72 103 175 69 73 142 61 69 130 
221 80 60 140 33 93 126 72 69 141 
222 57 96 153 83 80 163 64 60 124 
223 $5 65 160 70 | . 35 105 57 70 127 
231 98 66 164 76 79 155 70 86 156 
232 i 67 148 87 78 165 67 64 131 
233 54 94. 148 83 70 153 62 60 122 
311 62 86 148 69 84 153 55 90 145 
312 92 88 180 77 84 161 71 72 143 
313 7 72 143 46 80 126 94 76 170 
321 75 70 145 77 74 151 76 41 117 
322 60 | 109 169 73 79 152 58 80 138 
323 4 88 182 77 96 173 78 62 140 
331 86 80 166 67 102 169 57 a1 128 
332 79 90 169 65 92 157 76 60 136 
333 98 92 190 70 80 150 78 76 154 
1968 | 2125 | 4093 | 1790 | 1854 | 3644 | 1741 | 1746 | 3487 
11,224 


ANALYSIS OF DATA 


Table 1 is a tabulation of the yield in pounds of U.S. No. 1 tubers 
from each of the two plot rows, arranged to correspond to the planting 
plan of Figure 2. This arrangement of the data is needed for the calcu- 
lation of the total sum of squares and the sums of squares for rows and 
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TABLE 2—Concluded 


Block totals—10,394 and 10,404 


columns in the analysis of variance, Table 4. The data in Table 2 are 
tabulated as in Table 1 by adding the two single row figures for each 
plot and identifying the plot treatments from Figure 2, 
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Not Irrigated 
Total Total | 1 2 | Total 
ey ie 111 120 80 56 48 | 104 
112 101 9 | 46 | 71 | 
Peel Bi: 113 114 120 67 49 | 116 
121 136 101 68 45 | 113 
122 117 136 53 41 94 
pat. # 123 96 116 59 65 | 124 
131 107 120 | 52 | 64 | 116 
as 132 106 117 | 67 | 33 | 100 
133 101 99 68 50 | 118 
98 13 | 64 | 63 | 127 
213 112 125 56 64 120 
r 7 r 221 67 | 123 126 | 70 | 44 | 114 
fins Ul ag 222 51 | 112 130 | 57 | 70 | 127 
il. <8 223 62 | 122 102 | 68 | 60 | 128 
sa 231 56 | 112 118 | 64 | 62 | 126 
i yi 232 56 | 124 1388 | 59 | 66 | 125 
eras |e 233 52 | 123 12 | 63 | 78 | 141 | 
# ia | 311 64 | 134 117 70 48 | 118 
ie 312 58 | 130 123 | 70 | 58 | 128 
| 
At 313 67 | 125 11 {| 81 | 54 | 105 
se 321 55 | 119 135 | 74 | 68 | 142 
aie 322 60 | 120 105 | 59 | 67 | 126 
‘by HS 323 82 | 146 131 | 54 | 58 | 112 
ye 331 62 | 124 118 | 57 | 58 | 115 
1 a 332 50 | 114 111 64 70 | 134 
aie 333 62 | 126 | 12 | 54 | 64 |] 118 
ry ; mi 1mm | 1512 | 3192 | 1 3152 | 1650 | 1580 | 3230 
Poste 
tee 
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TABLE 3. TREATMENT TOTALS TABULATED FOR COMPLETE FACTORIAL 


165 


ANALYSIS 
Irrigated Non-Irrigated Total 
NPK t Total 
A B C Total A B Cc Total A B Cc 
lll 111 123 102 336 120 80 104 304 231 203 206 640 
112 137 144 112 393 101 98 117 316 238 242 229 709 
113 135 98 100 333 114 120 116 350 249 218 216 683 
Total 383 365 314 | 1062 335 298 337 970 718 663 651 2032 
121 162 116 135 413 136 101 113 350 298 217 248 763 
122 129 102 105 336 117 136 94 347 246 238 199 683 
123 123 99 84 306 96 116 124 336 219 215 208 G42 
Total 414 317 324 | 1055 349 353 331 | 1033 763 670 655 2088 
131 122 101 124 347 107 120 116 343 229 221 240 _ 680 
132 154 93 136°} 383 106 117 100 323 260 210 236 706 
133 129 100 138 367 101 99 118 318 230 199 256 685 
Total | 405 204 398 | 1097 314 336 334 984 719 630 732 2081 
1T1 395 | 340] 361 | 1096 | 363 | 301] 333) 758 | 641 | 694 2093 
1T2 420 339 352 | 1112 324 351 311 986 744 690 B64 2098 
1T3 387 297 322 | 1006 311 335 358 | 1004 698 | 632 680 2010 
Total | 1202 | 976 | 1036 | 3214 | 998 | 987 | 1002 | 2987 | 2200 | 1963 | 2038 6201 
211 162 142 96 400 | 130 112 122 364 292 254 218 764 
212 149 125 153 427 98 113 127 338 247 238 2380 765 
213 175 142 130 447 112 125 120 357 287 267 250 804 
Total 486 409 379 | 1274 340 350 369 | 1059 826 759 748 2333 
221 140 126 141 407 123 126 114 363 263 252 255 770 
222 153 163 124 440 112 130 127 369 265 293 251 809 
223 160 105 127 392 133 102 128 352 282 207 255 744 
Total 453 394 392 | 1239 357 358 369 | 1084 810 752 761 2323 
231 164 155 156 475 112 118 126 356 276 273 831 
232 148 165 131 444 124 138 125 387 272 303 5 
233 148 153 122 423 123 125 141 389 271 278 
Total 460 473 409 | 1342 359 381 392 | 1132 819 854 
2T1 466 423 393 | 1282 365 356 362 | 1083 831 779 
2T2 450 453 408 | 1311 334 381 379 | 1094 784 834 
2T3 483 400 379 | 1262 357 352 389 | 1098 840 752 
Total | 1399 | 1276 | 1180 | 3855 | 1056 | 1089 | 1130 | 3275 | 2455 | 2365 
311 148 153 145 446 134 117 118 369 282 270 
312 180 161 143 484 130 123 128 381 310 284 
313 143 126 170 439 125 lll 105 341 268 237 
Total 471 440 458 | 1369 389 351 351 | 1091 860 791 
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TABLE 3. Concluded 


Irrigated Non-Irrigated Total 


The plot totals for the two replicates are then added and this total is 
transferred to a tabulation sheet set up as in Table 3. All of the sub- 
totals necessary for the complete analysis of the factorial components 
are then obtained, by proper addition. 

The partition of the degrees of freedom as given in Table 4 follows a 
combination of split-plot and half-plaid Latin square breakdowns. 
The whole plots are the four irrigation plots and ‘the three degrees of 
freedom are divided one for irrigation, one for blocks and one for error 


| 
166 
A B C |Total| A B C A B Cc 
321 | 145| 117| 413] 135] 142] 396] 264| 286| 250|  g09 
322 | 169] 152] 138| 459] 120] 105] 126| 351| 289| 257] 264 810 
323 | 182] 173] 140] 495| 146] 131] 389] 304] 252 884 
Total | 496 | 476 | 395 | 1367| 385] 380| 1136 | 8s1| 2503 
331 | 166| 169| 128| 463] 124] 118] 115|:357| 200| 287| 243| 820 
332 | 169) 157] 136] 111} 134] 359] 283| 268] 270 821 
333 | 190] 150| 154 | 494| 126] 125] 118] 369| 316 | 272 863 
Total | 525 | 1419 | 354] 1085] 889| 830| 785| 2504 
3T1 | 459] 473] 390 | 1322 | 377] 370| 375 | 1122] 836| 843 | 2444 
3T2 | 518 | 470] 417] 1405} 364] 339] 1091 | 882] 809] 2496 
3T3 | 515 | 464 | 1428 | 397/| 367| 335| 1099] 912| 816 | 799| 2527 
pee . ime Total | 1492 | 1392 | 1271 | 4155 | 1138 | 1076 | 1098 | 3312 | 2630 | 2468 | 2369 | 7467 
| 421 | 418] 343 | 1182 | 384! 309 | 344 | 1037] 805 | 727| 687] 2219 
T12 | 466] 430] 408 | 1304 | 334| 372| 1035| 795| 764] 780| 2339 
Va T13 | 453 | 366 | 1219/ 351] 356] 341 | 1048| 804 | 722] 741] 2267 
ela ye “ Total | 1340 | 1214 | 1151 | 3705 | 1064 | 999 | 1057 | 3120 | 2404 | 2213 | 2208 | 6825 
T21 | 447] 393 | 1233] 362] 369| 1109| 825| 762] 2342 
T22 | 451 | 417] 367 | 1235 | 349] 371 | 347 | 1067| 800; 714! 2302 
T23 | 465 377] 351 | 1193] 364 | 349| 364 | 1077| 829| 726| 715 2270 
co a ai Total | 1363 | 1187 | 1111 | 3661 | 1091 | 1082 | 1080 | 3253 | 2454 | 2269 | 2191 | 6914 
T31 | 452] 425 | 408 | 1285| 343] 356] 357| 1056 | 765! 234% 
T32 | 471] 415] 403 | 1289 | 366 | 359/| 1069| 815! 781; 762; 2358 
T33 | 467| 403 | 414 | 1284| 350] 349] 377| 1076 | 817] 752] 791] 2360 
— | — | — | — | — | — — | — — 
a Pee Total | 1390 | 1243 | 1225 | 3858 | 1037 | 1071 | 1093 | 3201 | 2427 | 2314 | 2318 | 7059 
e. "| ee TT1 | 1320 | 1236 | 1144 | 3700 | 1105 | 1027 | 1070 | 3202 | 2425 | 2263 | 2214 6902 
") ee TT2 | 1388 | 1262 | 1178 | 3828 | 1022 | 1071 | 1078 | 3171 | 2410 | 2333 | 2256 6999 
net TT3 | 1385 | 1146 | 1165 | 3696 | 1065 | 1054 | 1082 | 3201 | 2450 | 2200 | 2247| 6897 
toy 4093 | 3644 | 3487 |11224 | 3192 | 3152 | 3230 | 9574 | 7285 | 6796 | 6717 | 20798 
| 
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TABLE 4. ANALYSES OF VARIANCE 


Degrees of Sum of 
Source of Variance freedom squares Variance 
Total 638 35 , 700.07 
Sub Total 323 31,002.07 
Irrigation (1) 1 4,201.40 4,201.40 
Blocks 1 16 
Error (a) 1 301.47 301.47 
Rows 32 4,731.82 
Variety (V) 2 876.53 438 .26 
VI 2 983 .38 491.69* 
Error (b) 4 275.94 68.98 
Columns 32 7,104.26 
Nitrogen (NV) 2 3,980.51 1,990. 26*** 
Phosphorus (P) 2 129.17 64.58 
NP 4 112.28 28.07 
Potash (K) 2 30.62 15.31 
NK 4 103 .00 25.75 
P= 4 109.89 27.47 | 
NI 2 884.60 442.30*** 
PI 2 152.02 76.01 
NPI 4 15.70 3.92 
KI 2 79.51 39.76 
NKI 4 197.04 49.26 
PET 4 64.13 16.03 
NV 4 162.14 40.54 
PY 4 91.37 22.84 
NPV 8 386.24 48.28 
KV 4 117.25 29.31 
NKV 8 361.44 45.18 
PKV 8 190.74 23.84 
NIV 4 223 .84 55.96 
PIY 4 63.53 15.88 
NPIV 8 767 .03 95 .88** 
4 201.99 50.50 
NKIV 8 434.32 54.29 
PEKEIV 8 277.88 34.74 
Error (c) 148 5,526.72 37 .34 
Sampling Error 315 4,698.00 14.91 


*Significant at P < .05 
*Significant at P < .01 
Significant at P < .001 
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(a) (irrigation X< blocks interaction). The degrees of freedom for the 
sub-plots are then partitioned as for four Latin squares. The four 
sets of 8 degrees of freedom for rows from each square are combined to 
give 32 degrees of freedom total for rows. Because of the half-plaid 
characteristic some of the row effects are confounded with variety 
effects. The 32 degrees of freedom for rows are divided into 2 for 
variety, 2 for V J, 4 for error (b) and the remainder. Error (b) is made 
up of 2 degrees of freedom for V X blocks and 2 for V I X blocks. There 
are also 32 degrees of freedom for columns, obtained in the same manner 
as for rows. The rest of the degrees of freedom are obtained in the usual 
manner of obtaining all the main effects and interactions. The only 
exception is NP K, NPKV, NPKI and NPKVI. ‘These are 
partially confounded with rows and columns and wil! not be calculated. 
Yates (5) gives a method for calculating the unconfounded portions of 
these interactions but the effort is rarely rewarded in cases of inter- 
actions of high order. Thus the confounded degrees of freedom of 
these N P K interactions are included in rows and columns and the rest 
are lumped together with error (c). One other characteristic of this 
particular experiment was that the records from one row of potatoes 
were lost. In Table 1 these have been listed as duplicates of the other 
row of the plot in the next to the bottom row of plots of Biock 1. These 
9 degrees of freedom are thus lost from the sampling error and this 
accounts for 638 instead of 647 for total and 315 instead of 324 for 
sampling error. 
The computations are done as follows: 


1. The total sum of squares is obtained from the half-plot observations 
in Table 1 as follows: 


S joe? 2 2, _ (20798)? 


2. The sub-total sum of squares is obtained from the plot totals in 
Table 2. 


(43° + 68° + --- 647) _ (20798) 
; 2 648 


3. The sampling error is the difference between the sub-total and the 
total sums of squares. 


4. The irrigation sum of squares is obtained from the totals in Table 3. 
(11224)* + (9574)* (20798)* 
324 648 
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5. 


The block sum of squares is obtained from block totals in Table 2. 


6. The four totals needed for the Block X Irrigation Interaction, 


=I 


12. 


13. 


whick is error (a), are found in Table 1. 


(5499)? + (4895) + (4679) + (5723)? (20798) _ SS. for 
1€2 648 blocks and 
irrigation 


. The row sum of squares is obtained from the row totals of Table 1. 


26 2 2 AN 2 2 SS. for blocks, 
1 


error (a). 


. The column sum of squares is obtained from the column totals of 


Table 1 in the same manner. 


. Calculate the sum of squares for varieties using totals from Table 3. 
. The VI interaction is obtained by using the totals in Table 3. 
. Error (b) is calculated as follows, using totals from Table 2. 


2 SS. for 
error (a). 


The sums of squares for N, P, K and the interactions given in 
Table 4 are calculated according to standard factorial procedures 
(5) using appropriate totals from Table 3. For example: 


2 2 2 2 
SS. for N = + (7130)’ + (7467) _ (20798) 


216 648 

9 2 2 2 2 

(2032)" + (2088)" + --- (2504)" (20798) 

SS. for NP = 648 

— §.S8. for N and P 

18 2 2 2 ‘Q\2 
(1062)* + (970)" + --- (1085)" (20798) 

SS. forN PI = 36 648 


— SS. for N, P, NP, I, NI, PI 


Experimental error (c) is obtained by subtraction of the components 
already computed from the sub-total sums of squares. 


Further details of computation may be found in Yates (5), Snedecor 


(4) under general description of analysis in factorial experiments, and 


¢ 
Ny 
9 
©) 
ene 
4 
' 


170 BIOMETRICS, JUNE 1953 


Cochran and Cox (1) Chapter 8. Significance tests and levels of sig- 
nificance can be found in Snedecor (4). 

The analysis may be carried further to single degree of freedom 
comparisons as described by Yates (5) and Snedecor (4). This step is 
used in interpretation but will not be described here. 


INTERPRETATION OF RESULTS 


The significance of the various main effects and interactions must be 
determined by comparison of the variances with the proper error 
variance. Irrigation can be tested only with error (a) and no signifi- 
cance is indicated. Since the design selected had a low precision for 
testing the effect of irrigation, small differences could not be detected 
in this experiment. The large variance due to irrigation might indicate 
that some advantage could be gained by analyzing the irrigated and 
non-irrigated sections separately. Since the separate analyses did not 
in this case change the interpretations they are omitted for simplicity. 

Even with the few degrees of freedom available the variety irrigation 
interaction was significant. The interpretation of this interaction is 
facilitated by plotting the means as in Figure 3. From this it is seen 
that without irrigation all varieties yielded about the same but with 
irrigation Variety A was much higher than B or C and there was no 
difference between B and C. Thus the interaction was the marked 
response of Variety A to irrigation compared to the lesser response of 
B and C. 

There was a significant response to nitrogen. Examination of the 
means (Table 5) reveals that there was a significant increase with each 


TABLE 5. RESPONSE TO NITROGEN APPLICATIONS 


Mean yield in Ibs. 
Level of Nitrogen per plot row 
28.71 
33.01 
Ns 34.57 


increment of N even though the response to the second increment was 
much less than the response to the first increment. A more convenient 
method of analyzing the N effect is to break out the two individual 
degrees of freedom. This is logically done by the linear and quadratic 
terms as described by Yates (5). This divides the 3980.51 sum of 
squares for N into 3710.08 for linear and 270.41 for quadratic. Both 
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Mean plot yield in pounds 


Irrigated 


jot Bot irrigated 


a B 
Variety 
FIGURE 3. INFLUENCE OF IRRIGATION ON YIELD RELATIONSHIPS 
OF THE VARIETIES 


terms are significant when compared by F test to the error (c). This 
means that although there was a marked linear response to N the second 
increment was less effective than the first. 

Further examination of the analysis of variance indicates a signifi- 
cant N J interaction. Comparison* of this variance with the variance 


*This comparison is valid on the assumption that the levels of N and I are samples of an infinite 
population of levels. If these are considered as fixed effects such a comparison is not valid. 
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Irrigated 


Mean plot yields in pounds 


Hot irrigated 


Levels ef 5. 
FIGURE 4. INFLUENCE OF IRRIGATION ON RESPONSE TO NITROGEN 


for N shows that N variance is not significantly larger than the N I 
interaction variance. Thus the effect of N must be interpreted only in 
terms of the irrigation level. The means for the N / interaction are 
plotted in Figure 4. Here it is seen that the response to N was much 
greater with irrigation than without. There seems to be a similar 
tendency toward less response from the second increment at both irri- 
gation levels. If the interaction is broken down to one degree of freedom 
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for linear N by irrigation and one for quadratic N by irrigation it will 
be found that the variance for linear N X Irrigation is 878.35 and for 
quadratic N X IJ is 6.25. Obviously the latter is non-significant indi- 
cating the same reduction in response to the second increment of N 
with and without irrigation. The linear responses indicate a much 
greater response to N with irrigation than without. Even though the 
reductions in response from the second increment of N were the same 
with and without irrigation it is evident that with irrigation the limit 
of economic N application will be much greater than without because of 
the greater response in each increment. 

The signiticance of the N ? I V interaction poses some new problems. 
First the N / variance and the V J variance should be compared with 
N PIV variance to determine whether N J and VI can still be in- 
terpreted as above in spite of the influence of P and V in them. NI 
variance is significantly larger than N PJ V variance so the above 
interpretation can stand. V J variance is not different from N PI V 
so the V J interpretation must be alvered. The N PJ V means are 
plotted in Figure 5 for ease of interpretation. Without irrigation there 
was little response to N. However, with irrigation there was response to 
N ard the degree of response varied with variety and phosphorus level. 
Variety A responded best to N at P; , Variety B responded best at P, or 
P; , the response at P, being linear, but at P; a reduced effect was found 
for the second increment of N. Variety C responded best to N at P, . 
Thus the response of the varieties to N with irrigation must be inter- 
preted with respect to specific varieties at definite levels of P. 

One other point of interest is the comparison of the sampling error 
and the experimental error (c). Using the notation found in Snedecor 
(4) it is found that Sz = 11.21 and Ss = 14.91. The estimate of the 
variance of a treatment mean is: 


S3 11.21 14.91 
4 R Rk 


where R is the number of plots of the treatment and k is the number of 
rows per plot. It is evident that increasing replications or size of plot 
will both reduce the treatment mean variance. However, it would seem 
that increasing replications will be more advantageous in this case since 
R is present in both terms of the equation and the two terms are the 
same size. If S; was markedly greater than Sz then there also would 
be some advantage in increasing the number of rows per plot. Whether 
additional precision is desired depends on the objectives of the experi- 
ment. For this experiment, destined to be continued for several years, 
the degree of precision seems adequate. 
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DISCUSSION AND SUMMARY 


That this design has succeeded in removing a large portion of field 
variability was confirmed by the very large components of variance 
associated with columns and with rows exclusive of varieties. The 
gain in precision may be calculated. Assuming that the irrigation plots 
would have to be split for varieties and the fertilizer treatments then 
randomized on each variety plot, and assuming that the N P K inter- 
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FIGURE 5. INFLUENCE OF IRRIGATION ON THE EFFECT OF PHOSPHORUS 
ON THE RESPONSE OF VARIETIES TO NITROGEN 
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actions are not significant, the error variance for such a design would 
be the rest of the rows plus columns plus error (c). This gives a new 
error variance of approximately 75. Dividing 37 by 75 we obtain 49 
per cent as the efficiency of an unconfounded arrangement. The re- 
ciprocal of this is 204 per cent and the gain in information due to con- 
founding is 104 per cent. 

This paper is a description of the split-plot half-plaid Latin square 
in a field irrigation experiment with fertilizer and potato varieties. 
Some of the major considerations which led to the selection of the design 
are discussed. A description is given of the method of analysis of the 
data and some of the interactions are discussed. The use of the half- 
plaid Latin square feature has resulted in a gair cf 100 per cent in 
information compared to the split-plot design. For a complete interpre- 
tation of three years’ data from this experiment the reader is referred 
to (3). 
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FITTING THE NEGATIVE BINOMIAL DISTRIBUTION 
TO BIOLOGICAL DATA 


C. I. Briss 


The Connecticut Agricultural Experiment Station and 
Yale University 


NOTE ON THE 
EFFICIENT FITTING OF THE NEGATIVE BINOMIAL 


R. A. FIsHER 


Department of Genetics 
Cambridge 


In studying the occurrence of plants and animals in nature, the 
number of individuals may be counted in each of many equal units of 
space or time. The original counts can be summarized in a frequency 
distribution, showing the number of units containing zx = 0, 1, 2, 
3, --- individuals of a given species. If every unit in the series were 
exposed equally to the chance of containing the organism, the dis- 
tribution would follow the Poisson series, each unit having the popula- 
tion mean as its expected frequency. It is easy to test whether the 
variation in the number of individuals per unit agrees with this hypo- 
thesis. Since the expected variance of a Poisson distribution is equal 
to its mean, the observed variance s’, multiplied by the degrees of 
freedom n, may be divided by the sample mean Z to obtain x” = ns’/Z. 


More often than not x’ is significantly larger than its expectation, © 


not only in distributions of plants and animals in nature but even in 
the laboratory. 

A number of distributions have been devised for series in which the 
variance is significantly larger than the mean (2, 11, 21), frequently on 
the basis of more or less complex biological models. In the present 
paper this characteristic will be called “over dispersion”. Perhaps the 
first of these was the negative binomial, which arose in deriving the 
Poisson series from the point binomial (27, 32) although it had been 
formulated in 1714 (2). Comparisons of expected and observed dis- 
tributions have shown its wide applicability to biological data. The 
relative ease with which the negative binomial can be computed and 
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other desirable properties that have been described by Fisher (12) and 
by Anscombe (2) have now been supplemented by a practicable maxi- 
mum likelihood estimation of its parameters described in the note 
by Sir Ronald Fisher at the end of the present paper. Here we will 
consider some characteristics of the negative binomial, estimates of its 
parameters and their precision, the calculation of the expected fre- 
quencies for a given sample, and its applicability to biological data. 

The Negative Binomial Distribution. The negative binomial distri- 
bution is completely defined by two parameters, the arithmetic mean m 
and a positive exponent k. It is so called by analogy with the positive 
binomial distribution, (¢ + p)”, where n’ is the number of individuals 
in a group and q and 7p are the expected proportions in two contrasting 
categories with g + p = 1. In computing the statistics of the positive 
binomial from distributions of yeast cells counted with a haemocy- 
tometer, Student (27) observed that two of his series gave negative 
values for p and n’ but nevertheless fitted his observations very well. 
These and other cases (32) were described by the negative binomial, 
(q — p)*, where p = m/k and q = 1+ p. By expansion of this ex- 
pression, the probability P, that an observational unit x will contain 
0, 1, 2, --+ individuals is 


where R = p/q = m/(k + m). The probability for a given z is multi- 
plied by N, the total number of units counted, to obtain the expected 
frequency (¢) of units with z individuals. The curve defined by the 
P.’s (or ¢’s) is unimodal, so that in fitting the negative binomial to an 
observed distribution any apparent bimodality (or multimodality) is 
attributed to random sampling. 

The negative binomial is an extension of the Poisson series in which 
the population mean m, the parameter of the Poisson distribution, is 
not constant but varies continuously in a distribution proportional to 
that of x”. As the variance of a negative binomial approaches the mean, 
or the over-dispersion decreases, k © and p— 0. Under these con- 
ditions it can be shown (13) that the distribution converges to that for 
the Poisson, P, = e ” (m*/zx!). Conversely, if the over-dispersion in- 
creases sufficiently, k + 0. If we disregard the number of units con- 
taining no individuals, the negative binomial then converges to Fisher’s 
logarithmic series (13), which describes effectively the apparent abund- 
ance of different species. Thus the limiting values of the exponent lead 
to distributions of importance in biology. 
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An example of the negative binomial is provided by counts of the 
number of European red mites on apple leaves, for which I am indebted 
to Dr. Garman of The Connecticut Agricultural Experiment Station 
(15). On July 18, 1951, 25 leaves were selected at random from each of 
six McIntosh trees in a single orchard receiving the same spray treat- 
ment, and the number of adult females counted on each leaf. The 
frequency distribution of mites on the 150 leaves is given in the first 
two columns of Table 1. 


TABLE 1 


Fitting the negative binomial to counts of red mites on apple leaves, 
data of P. Garman (15). 


No. of mites No. of leaves | Accumulated Expected (f — ¢)? 
per leaf observed frequencies frequencies 

Zz 7 Az 

0 70 80 69.49 .004 
1 38 42 37.60 .004 
2 17 25 20.10 478 
3 10 15 10.70 .046 
4 9 6 5.69 1.925 
5 3 3 3.02 

6 2 1 1.60 027 
7 1 85 

8+ 0 

Total 150 = N 150.00 2.484 = 2 


S(fz) = 172, S(fz*) = 536, S(fz*) = 2170 
Z = 1.14667 (Eq. 2), s? = 2.27365 (Eq. 4) 


As a test of agreement with the Poisson series, the observed 
variance (s°) has been computed from the basic sums beneath the 
table. It was nearly twice as large as the mean. From x’ = 149 X 
2.27365/1.14667 = 295.44 with 149 degrees of freedom, P is less than 
.001. The over-dispersion was clearly far too large for the Poisson 
series. By contrast, the frequencies expected by the negative binomial, 
shown in the fourth column of Table 1, reproduced the observed values 
very closely. In the next sections, the detailed computation of- the 
negative binomial will be illustrated with this example. 

The Statistics of the Negative Binomial. The parameters m and k of 
the negative binomial are estimated from the frequency distribution of 
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a sample by the statistics and k. The mean is estimated efficiently 
from the frequency f of units at each z as © 


= = S(fx)/N (2) 


For the distribution in Table 1 its numerical value was ¢ = 1.146667. 

The exponent k is more difficult. Two approximations (methods 1 
and 2) are available, each with a relatively high efficiency for suitable 
combinations of m and k (1, 2). With the maximum likelihood solution 
(method 3) in the note appended to this paper, either estimate may be 
used as a first step toward a fully efficient fitting. 

(1) The original and simplest solution for the statistic k is that 
based upon the first and second moments (12, 32). It is determined 
from the mean and variance s” of the sample as 

2 
(3) 


where, as usual, the variance is 


(4) 


The efficiency of the moment solution as defined by Fisher (12) has . 
been plotted by Anscombe (1, 2) for different combinations of m and k. 
It has an efficiency of 90 percent or more for small values of m when 
k/m > 6, for large values of m when k > 13, and for m in the inter- 
mediate zone when (k + m) (k + 2)/m > 15. In practice the popula- 
tion values m and k are replaced necessarily by the statistics 2 and k. 

In our example, the estimation of k, leads by Eq. 3 to ki = 
(1.14667)?/(2.27365 — 1. 14667) = 1.16670. If %, has been estimated 
with an efficiency of 90 percent or better, the inequality 2.313 x 
3.167/1.147 > 15 should hold, but since 6.39 > 15, this method is not 
suitable for the present series. 

(2) An alternative estimate (1, 2) is based upon the ratio of the total 
number of units in the sample (N) to the number of units without 
organisms (f,). From Eq. 1 the expected probability for z = 0 is 
P, = 1/q", and if P, is replaced by the proportion observed in the zero 
class, we have f./N = 1/q". Since gq = 1 + m/k, an iterative solution 
_is necessary. The required estimate of k. is that which balances the 
equation 


ky log (1 + £/k,) = log (N/fo) (5) 


The left side of Eq. 5 is computed twice, with different trial values 
k’, one giving a larger and the other a‘smaller product than the con- 
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stant term on the right of the equality. Interpolating between these 
two products for log(N/fo) leads to a first approximation of k, , with 
which the process can be continued until &, has the required precision. 
To estimate k with an efficiency of 90 percent or more, at least 1/3 of 
the units must be empty, but if the mean is less than 10, enough more 
empty units are required to satisfy the inequality (m + 0.17) (P. — 0.32) 
> 0.20. The terms in the inequality are replaced necessarily with 
m = £and P, = f,/N from the sample. 

Since the number of zero frequencies in Table 1 is relatively large, 
method 2 is the more promising. As a test of its expected efficiency, 
we find (x + 0.17) (f./N — 0.32) = .193, which is very close to the 
level for 90 percent efficiency. From Eq. 5 the required f, is the trial 
value k’ for which k’ log(1 + #/k’) = log(150/70) = .330993. The 
first trial value of k’ = 1 was based on the estimate from method 1 and 
the computation arranged as follows: 


k’ 1 + k’ log (1 + £/k’) 
1 2.14667 331767 
98 2.17007 329744 
992 2.15592 330962 


With k{ = 1, the left side of Eq. 5 was too large, so that the calculation 
was repeated with k; = .98, reversing the inequality. By interpolation 
ki = .98 + .02 (.330993 — .329744)/(.331765 — .329744) = .992, 
which slightly underestimated the required value. By interpolation 
between ki and ki , k. = .99231. 

(3) For many distributions, k cannot be determined by either of 
the above techniques with an acceptable efficiency. In these and all 
critical cases, the k computed by Eq. 3 or 5 may be considered as the 
first step toward a definitive solution. This is provided by the method 
of maximum likelihood (17, 24). By suitable arrangement of the calcu- 
lation, as developed by Sir Ronald Fisher in the appendix to the present 
paper, the procedure is practicable and rapid when the largest observa- 
tion does not exceed 20 or 30. Scores (z;) are computed from trial values 
of ki , selected so that they bracket the required estimate k, for which 
z; = 0 in the equation 


- win(1 +5) (6) 


where In designates a natural logarithm. As a first step in the compu- 
tation, the accumulated frequency 4A, in all units containing more than 
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x organisms is written opposite each z. The reciprocals 1/(k; + 2x) 
from Barlow’s Tables (to seven places or more) are multiplied in turn 
by A, and the products accumulated to obtain the summation in the 
first term. The second term may be determined as the seven-place 
common logarithm of (1 + </k;) multiplied by 2.3025851 x N. 

The first score z; (¢ = 1) is computed with the first trial value ki, 
usually based upon the k from Eq. 3. The second trial value k, depends 
upon the sign of z,. If z, is positive, k; > k{ ; if negative, k; < kj , the 
two differing just enough to give opposite signs to z, and z,. The 
third trial value kj , between k{ and kj , is obtained by linear inter- 
polation for z = 0, but the computed z, is rarely exactly zero. For 
precision, it is preferable to compute a z, with a sign opposite to that 
of z; by selecting k{ at about the same distance as k} beyond a newly 
interpolated k’ for z = 0. This provides a narrower interval within 
which the final & may be interpolated and a better estimate of its 
variance obtained. 

Since the better of the two approximations of & in our example was 
barely 90 percent efficient, the likelihood solution would be preferred. 
The cumulative frequencies exceeding each xz were first listed (Table 1), 
so that for x = 0, for example, A, = 150 — 70 = 80, and for z = 1, 
A, = 80 — 38 = 42. Starting with a trial value of k{ = 1.0, the re- 
ciprocal, 1/(k{ + x), for each x was multiplied by its corresponding A, 
and the products accumulated in the calculator to obtain the first term 
in the score z (Eq. 6). This sum has been listed separately and below 
it the second term in the score, 345.3878 & log(1 + 1.146667/k’): 


k=10 =1.05 k= 1.026 kf = 1.023 
S{A./(k’ + 2)} 114.9262 112.5247 112.7961 
—N In(1 + —114.5875 —110.7227 —112.5432 —112.7752 
2; 3387 — 3182 — .0185 0209 


Since z, was positive, the second trial k’ was increased to k; = 1.05, 
leading to a negative z. . Interpolating between them for z = 0, 
ki = 1.0 + (.3387 X .05)/(.3387 + .3182) = 1.026, which, in turn, gave 
z; = —.0185. From linear interpolation between z, and z; for z = 0, 
k = 1.0247. To insure a positive score near zero, a new trial value was 
selected of ki = 1.023. Interpolation between z, and z, gave the maxi- 
mum likelihood estimate of k = 1.02459. 

The Variances of and k. The sampling variances of the statistics 
of the negative binomial depend upon the parameters m and k, but in 
practice these are replaced necessarily by the corresponding statistics 
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# and k. The mean is computed efficiently in all cases and its variance 
is 


= (m4 ™)/w | (7) 


The error variance of the mean in the example from Table 1, computed 
with the maximum likelihood estimate of k, was V(#) = (1.14667 + 
1.28330)/150 = .01620, giving the standard error s, = .1273. 
The variance of k depends upon how it has been estimated. In 
general, the variance of & is less when k is small than when it is large. 
(1) If computed from s’ by Eq. 3, its large-sample variance (2) is 


vk) = MELD 8) 


solved with R = #/(k + @). For k, = 1.16670 from Eq. 3, R = 
1.14667/2.31337 = .49567, and V(k,) = 5.0558/36.853 = .1372, from 
which this estimate of k has a standard error V .1372 = .370. 

(2) If & is determined from the number of zero units by Eq. 5, its 
large-sample variance (2) is 


(1 — R) — RF (9) 


Vik.) = 


where RF is defined as above and In is a natural logarithm. The error 
variance in the example for k, = .99231 as estimated by Eq. 5 is solved 
with R = 1.14667/2.13898 = .53608, to obtain V(k,) = .61089/8.0709 = 
07569. The standard error of k, is 0.2751, or about 3/4. as large as 
that for k, . 

(3) The variance of the maximum likelihood estimate of k is the 
reciprocal of the amount of information about k, or the rate at which 
the score z is decreasing as it passes the zero. It is computed, there- 
fore, from the two values of z; just above and below zero, say z; and z , 
and the two trial values of k; with which they have been computed, k; 
and kj; , as 

ky — ki 
V(k) = 


(10) 
Hence the error variance of k = 1.02459 is 
V(k) = (1.026 — 1.023) /(.0209 + .0185) = .07614, 


so that the maximum likelihood & had a standard error of s; =. 2759. 
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Tests for Agreement with the Negative Binomial. Perhaps the most 
convincing test of the adequacy of the negative binomial in any given 
case is the agreement between the frequencies (f) observed at each x and 
their expected values (¢) as computed from the statistics of the sample. 
The discrepancy between them is easily tested by x’. 

The expected frequencies are computed with Eq. 1, most readily in 
succession and starting with the number expected at zt = 0. This first 
expectation is determined with the aid of 7-place logarithms as 


= N/q (11) 
and the succeeding entries for a = 1, 2,3, --- as 
k —1 
= (12) 


More decimal places should be retained in the calculator at each stage 
than need be recorded, so as to avoid accumulating rounding erro-s. 
The observed and expected frequencies are then compare by x’, where 


2 
— ) \ (13) 
x’ has three fewer degrees of freedom than the number of ratios that 
are summed. As usual, the frequencies with small expectations are 
pooled, preferably so that no expectation is less than 5. If x’ shows 
good agreement, between the matched frequencies, no other test may 
be needed, especially if both @ and k are efficient estimates. 

In the example, the expected frequency for x = 0 was determined 
with Eq. 11 as the antilog of log (150) — 1.02459 log (2.11915) or 
¢ = 69.4879, giving the initial entry in the fourth column of Table 1. 
From p = 1.11915 and g = 1 + p, R = 1.11915/2.11915 = .528113 
and by Eq. 12, ¢, = 1.02459 X .528113 X 69.4879 = 37.5999, ¢@ = 
2.02459 X .528113 X 37.5999/2 = 20.1011 and so on for successive 
values of z. To avoid rounding errors, six significant figures were 
carried in the calculation, although the ¢’s ‘were recorded only to two 
decimal places. The final value, for x = 8+, was obtained as the 
difference between 150 and the sum of preceding ¢’s. The calculation 
of x’ for the discrepancy between the observed and expected frequencies 
(Eq. 13) is shown in the last column of Table 1, pooling the frequencies 
for x > 5 so as to avoid expectations of less than¢@ = 5. The resulting 
x’ = 2.484 with three degrees of freedom and P = 0.48 indicates good 
agreement with the negative binomial. 

The comparison of observed and expected frequencies by x” may be 
distorted by chance irregularities in the individual entries. Thus in 
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testing agreement with Neyman’s contagious distribution, Beall (6) 
“smoothed” some of his more uneven observed frequencies before 
computing x’. This difficulty can be avoided by testing the agreement 
of the observed with the expected second and third moments of the 
negative binomial. The relation of these tests to alternative formula- 
tions, such as the logarithmic, the discrete log-normal, and the ‘‘con- 
tagious” distributions, has been considered by Anscombe (2). The 
moment tests have the further advantage that they take account of the 
few large values which are missed by grouping the tail of an observed dis- 
tribution in computing x’. 

Two tests have been described by Anscombe (2), each having the 


- form of a difference between an observed and an “expected” moment. 


Although m in each test is estimated efficiently by Z, its variance has 
been derived on the assumption that an efficient estimate of k is not 
available for computing the expectation. A variance so derived may not 
apply when the expected moment is estimated by maximum likelihood. 
Hence the test criteria T and U are defined in terms of the estimates for 
which the variances are known. These variances, however, should 
always be computed with the best available estimate of k, usually that 
derived by maximum likelihood (é). 

The difference T between the third moment of the sample and its 
value predicted from the first two moments of the same sample of a 


negative binomial is 
3 2 
r= _ i} (13) 


where 
[x*] = S{f@ — = S(fx*) — 3ZS(fx*) + (14) 


The significance of the difference 7’ is determined by comparison with 
its standard error, the square root of its large-sample variance 


V(T) = 2m(k + 1)p'q’[2(3 + 5p) + 3kq]/N (15) 


The variance of 7 should be computed with estimates of p, g and k 
based upon the maximum likelihood & when it is known. Although the 
expected third moment may be determined more accurately from the 
same maximum likelihood estimates as g(q + p)m, the variance in Eq. 
15 is then of doubtful applicability. 

When the observed second moment is compared with its expectation 
computed with k, , the difference 


U=s — 2°/k,) (16) 


aks 
4 
| 
tee 
- 
alt 
| 
thoy 
| 
i 
4 
in 
Ur | 
We 


NEGATIVE BINOMIAL DISTRIBUTION 


has the large-sample variance 

2 
V(U) = 2m(k + =) / N+ (17) 
V(ks) is defined in Eq. 9 but computed with the maximum likelihood 
estimace / if this is known, as are the other terms in Eq. 17. Here 
again the expected second moment, gm, can be estimated more exactly 
with the maximum likelihood value for k but the applicability of the 
veviance in Eq. 17 is then in doubt. 

The differences between the observed and expected moments are 
much easier to compute than their standard errors. From [z*] = 
778.47 (Eq. 14) ie present example had an observed third moment 
of 5.1898 and an expected value from the first two moments, Z and s” of 
6.7429, so that 7 = --1.553 by Eq. 13. Since its variance by Eq. 15 
was 4.1272, the standard error of 7’, 2.032, showed no discrepancy from 
a negative binomial. The corresponding difference for the second 
moment and its error has been computed by Eqs. 16 and 17, to obtain 
U = —0.198 + 0.302, again in agreement with the negative binomial. 

Models for the Negative Binomial. When the number of individuals 
per unit of space or time in repeated counts cannot be assumed to have 
the same expected value, they may represent a mixture of several 
homogeneous Poisson distributions. The number in each unit is re- 
stricted to the integers but this is not true of the expectations or means. 
In a mixture of Poissons, the means represent a positive continuous 
variate. The simplest frequency distribution which they might follow 
is the Eulerian distribution or the Pearson type III curve, and if in fact 
tney are so distributed the observations will conform to the negative 
binomial. 

The expected frequency may be known to vary within an observed 
distribution. A case in point is the distribution of bacterial clumps 
over a milk film (20). At least two disturbing factors were involved. 
In preparing a film, 0.01 milliliter of milk was placed on a microscopic 
slide and spread with a needle over an area of one square centimeter. 
Bacteria caught by surface tension on the lower surface of the drop 
adhered to the glass slide on contact and thus increased the concentra- 
tion of bacteria in this area of the film. Secondly, the fresh drop did 
not have the same thickness over the entire square centimeter but due 
to surface tension was thicker in the center than in the margins, so 
that more bacteria were deposited in the center. Despite these two 
factors, milk meeting public health standards had so low a bacterial 
count that the distribution of bacterial clumps per microscopic field 
was seldom distinguishable from a Poisson. However, when the 
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TABLE 2 


Observed distributions of bacterial clumps per field (Obs. f) in a milk film (20) and of 

yeast cells per square in a haemocytometer (27) and the expected frequencies com- 

puted from the negative binomial (Bin ¢) and from the Neyman type A (Ney ¢) 
distributions (21). 


No. per Bacterial clumps Yeast cells 
unit Obs. Bin. Obs. Bin. Ney. 
z f f 
0 56 64.2 213 214.2 214.8 
1 104 90.3 128 122.8 121.3 
2 80 82.7 37 45.0 45.7 
3 62 62.1 18 13.4 13.7 
4 42 41.6 3 3.5 3.6 
5 27 25.8 1 9 8 
6 9 15.1 +.2 +.1 
7 9 8.5 
8 5 4.7 
9 3 2.5 
10 2 1.3 
ll +1.2 
19 1 
N, P(x?) 400 54 400 .19 18 


bacterial count was high, the distribution of clumps per field reflected 
this known heterogeneity and then was often a negative binomial, as 
in the example in Table 2. This phenomenon of substantial agreement 
with the Poisson at low population densities and with the negative 
binomial at higher densities has been observed with both plant and 
animal populations (4, 5). A lack of randomness in microscopic counts 
was observed by Student in counting yeast cells with a haemocyto- 
meter (27). In fact, this seems to be the first case to be fitted with a 
negative binomial. The original counts are given in Table 2, together 
with the expected negative binomial frequencies as computed by 
maximum likelihood and those computed by Neyman (21) with his type 
A contagious distribution. 

The distribution of insect pests is so seldom uniform that most 
experiments on insect control are randomized and replicated. In a 
field experiment of this type on the corn borer, four treatments were 
arranged in 15 randomized blocks (26). At the end of the season, 
eight hills of corn were selected at random in each plot and the borers 
recorded from each hill. This experiment has been reported both in 
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TABLE 3 

Distribution of corn borers (Obs. f) in a field experiment arranged in 15 randomized blocks, where 

treatment 1 is the untreated control or check (6). The expected freq ies for the negative binomial 


(Bin. @) have been computed independently for each treatment with the statistics in the last row of the 
table; those expected for the Neyman type A (Ney @) are from Beall (6). 


Borers Treatment 1 Treatment 2 Treatment 3 Treatment 4 
per hill Obs. Bin. Ney. | Obs. Bin. Ney. | Obs. Bin. Ney. | Obs. Bin. Ney. 

0 19 16.6 34.4 24 19.6 22.6 43 44.3 49.8 47 45.3 53.4 
1 12 18.5 6.4 16 22.2 16.7 35 31.1 23.3 23 «30.1 19.7 
2 18 16.9 10.4 16 «19.7 18.3 17 19.1 18.9 27) «18.4 17.5 
3 18 14.5 11.9 18 15.9 16.4 ll 11.2 12:3 9 11.0 12.1 
4 11 11.9 11.2 15 12.1 13.4 5 6.4 7.3 4 6.4 7.5 
5 12 9.5 9.5 9 9.0 10.3 4 3.6 4.1 3 3.7 4.4 
6 7 Ts tt 6 6.5 7.5 1 2.0 2.2 1 2.1 2.5 
7 8 5.9. 6.4 5 46 5.2 2 » 5 1 1.2 1.4 

8 4 4.5 5.2 3 3.3 3.5 2 
9 4 3.5' 4.1 4 2.3 2.3 +1.2 +1.0 +1.8 41.5 
10 1 2.7 3.2 2 22 438 1 
li 2.0 2.5 23 9 1 
12 1 its 1 
13 1 1.2 1.4 +2.1 +1.4 
15 1 
17 1 +3.3 +3.6 
19 1 
26 1 
N, P(x?) 120 «4.65 | 120 120 .88 .09 | 120 .16 .09 
Zk 4.033 1.532 3.167 1.764 1.483 1.333 1.508 1.190 


terms of the total number of borers per plot (26) and as frequency dis- 
tributions showing for each <reatment the number of hills with z = 1, 

-- borers (6). From an analysis of variance of the plot totals (in 
logarithmic units) the level of borer infestation varied significantly 
from block to block (P < .01). In consequence, the composite dis- 
tributions from the 15 plots for each treatment (Table 3) represented 
unequal levels of infestation. Negative binomials have been fitted 
separately to each of them. Since the expected frequencies agreed well 
with the observed values, the data are consistent with the hypothesis 
that each represented the sum of several Poisson distributions of 
unequal means. 

The distributions in Table 3 might arise nek a different model 
for the negative binomial in which the non-randomness is attributed to 
“contagion’’, in this case, a result of the larvae hatching from eggs 
that were laid in masses. Contagion, in fact, was the basis for the 
Neyman type A distribution developed originally for these observations 
(21, 6) as described in the next section. With a different mathematical 
formulation it leads also to a negative binomial. ‘Contagion’ was one 
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TABLE 4 
Distribution of the number of accidents experienced by machinists (f) and their negative binomial 
expectations (@) (16). Distribution of soil bacteria in microscopic counts, showing the colonies per 
field fitted with the Poisson series, the bacteria per colony with a logarithmic series, and the bacteria 
per field with a negative binomial (18). 


Accidents No. of Colonies No. of Bacteria No. of Bacteria No. of 
perma- machinists per fields per colonies per fields 
chinist field Obs. Cale. colony Obs. Cale. field Obs. Cale. 

0 296 296.7 0 11 14.6 1 359 2.1 0 11 13.0 
1 74 71.0 1 37 40.9 2 146 =«:136.1 1 17 21.0 
2 26 26.4 2 64 57.2 3 57 68.3 2 31 24.6 
3 8 11.0 3 55 53.4 4 41 38.5 3 24 25.4 
4 a 4.8 + 37 37.4 5 26 23.2 4 29 24.2 
5 4 2.2 5 24 20.9 & i7 14.5 5 18 22.0 
6 1 1.0 6+ 12 15.6 7+ 27 30.3 6 19 19.4 
7 ) 7 16 16.7 
8 1 -2 8 13 14.1 
9 +.2 9 17 11.7 

10 6 9.6 

1l 8 7.8 

12+ = 30.5 

N, P(x?) 414 .57 240 .63 673 56 | 240 .52 


of the explanations proposed by Student (28,, who wrote, “If the 
presence of one individual in a division increases the chance of other 
individuals falling into that division, a negative binomial will fit best, 
but if it decreases the chance, a positive binomial”. This explanation 
has figured prominently in the study of accident statistics (3, 16). 

An early example is the distribution in Table 4 of accidents ex- 
perienced by 414 machinists in three months, where the observed 
frequencies are matched satisfactorily by those comput+d with a negative 
binomial. If each machinist had had the same initial probability of 
being involved in an accident but if this probability were increased (or 
decreased) by his having an accident, contagion would be present and 
a negative binomial distribution could result. However, an opposite 
assumption leads to exactly the same expected distribution. If ex- 
periencing an accident had no effect upon the risk of another accident, 
but if the individual machinists or their shops or intervals within the 
three month period differed in their accident-proneness, a negative 
binomial would also result. Hence, the appearance of “contagion is not 
inherent in nature but simply in our method of sampling” (11). The 
relation of these two models, a mixed or compound Poisson distribution 
without contagion and contagion which changes the odds of further 
events, is discussed ably in the recent monograph by Arbous and Kerrich 
(3). Alternative distributions have been developed from other math- 
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TABLE 5 
Observed distributions of quadrat counts (Obs. f) of Lespedeza capitata and of Liatris aspera in an old 
field association (30) and of Primula auricul in a grassland association (7), and their expectations with 
the negative binomial (Bin. @), Neyman contagious type A (Ney @) and Thomas double Poisson 
(Thom distributions. 


Plants Lespedeza capitata Liatris aspera Primula 
per quadrat Obs. Bin. Ney* Thom Obs. Bin Thom Obs. Bin 
0 7178 7178.1 7188.4 7279.2 7403 7403.1 7420.4 26 23.6 
1 286 283.7 219.6 105.2 183 179.8 140.0 21 26.1 
2 93 95.0 140.8 127.9 34 40.0 62.3 23 20.9 
3 40 41.1 61.6 78.6 14 11.5 14.4 14 14.7 
q 24 19.8 21.1 33.1 4 3.7 2.4 1l 9.5 
5 7 10.2 6.2 11.1 1 1.3 0.4 4 5.9 
6 5 5.4 i 3.4 1 4 0.1 5 3.5 
7 1 2.9 5 1.0 +.2 4 2.1 
8 2 1.6 y +.5 1.2 
9 1 9 1 By 
10 2 5 +.8 
1 3 
12 +.5 
N, P(x?) 7640 84 <.001 <.001 7640 46 <.001 109 .70 


*Computed by maximum likelihood (30), all other Neyman Type A expectations fitted by moments. 


ematical definitions of ‘‘contagion’”’ and will be considered in the next 
section. 

The negative binomial may result from a different but related 
model. In counts of soil bacteria, Jones and Mollison (18) recorded 
for each microscopic field both the number of bacterial colonies and 
the number of bacteria in each colony. The number of colonies per field 
agreed well with the Poisson expectation for a random distribution and 
the number of bacteria per colony in the same counts with Fisher’s 
logarithmic distribution. Under these conditions, the expected distri- 
bution of bacteria per field is a negative binomial (23), as confirmed 
by the agreement of the expected and observed frequencies (Table 4). 

Quadrat counts in plant ecology which departed from Poisson have 
been attributed to the occurrence of plants in “clumps”. Blackman (7) 
noted that Primula auricula reproduces vegetatively by short rhizomes, 
so that older individuals are often surrounded by younger plants. In 
an old-field community (30) both Liatris astera and Lespedeza capitata 
tended to occur in clumps. The distributions of clumps per quadrat 


and of plants per clump have not been reported separately, so that the 
model cannot be tested directly. However, the distribution of plants 
per quadrat in all three cases agreed excellently with the negative 
binomial (Table 5). 
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TABLE 6 
Observed animal distributions (f) and their negative binomial expectations (@) of Microcalanus nauplii 
in samples of marine plankton (also fitted with a Neyman type A) (5), of Tanytarsus in Ekman hauls 
(19), of Oligochaetes in Petersen hauls (19), of isopods under boards (9), and of the mite Liponysus 
bacoti on rats in Savannah (10). 


Individuals Microcalanus Tanytarsus | Oligochaetes Isopods Mites 

per unit Ney 
0 1 8 32 29.5 39 34.7 28 30.2] 160 160.0 
1 2 8 1.9 28 32.5 24 29.6 28 «21.6 19 15.9 
2 4 2.1 3.7 25 29.0 18 23.6 14 16.2 ll 8.5 
3 3 4.3 5.8 34 «023.9 21 18.3 ll 12.3 6 5.8 
4 5 7.1 8.0 13 «18.9 15 «14.1 8 9.4 5 4.3 
5 8 9.9 10.0 14 14.5 15 10.7 ll 7.3 4 3.4 
6 16 12.4 11.6 17 10.9 6 8.2 2 5.6 4 2.8 
rf 13 14.0 12.5 5 8.1 8 6.2 3 4.3 3 2.4 
8 12 14.7 12.9 6 6.0 6 4.6 3 3.4 2 2.0 
9 13 14.5 BF 1 4.4 2 3.5 3 2.6 2 1.8 
10 15 13.5 11.9 9 3.2 1 2.6 3 2.0 1.6 
ll 15 12.0 10.9 2.3 2 2.0 2 1.6 1.4 
12 9 10.3 9.6 1.7 3 1.5 1.2 1 1.2 
13 9 8.5 8.2 2 1.2 3 i 1 9 1.1 
14 7 6.8 6.8 1 9 8 2 7 1.0 
15 4 5.2 5.5 6 1 6 1 6 2 9 
16 4 4.0 4.4 4 +1.9 4 8 
17 6 2.9 3.4 1 3 2 3 8 
18 2 2.1 2.6 1 .2 +1.4 1 a 
19 1.5 1.9 +.5 1 6 
20 2 1.1 1.4 6 +10.0 
21 1 
22 +1.5 +3.5 

N, P(x?) 150 .89 .64 189 .14 | 164 .53 | 122 .34 | 227 .59 


Other models have been developed for the negative binomial (2) and 
there is no reason to suppose that the possibilities have been exhausted. 
This is suggested in part by the variety of its applications to biological 
data. Some applications to fresh-water dredge samples (19) and to 
marine plankton (5) are shown in Table 6. It has formed the basis for 
a sequential sampling scheme for tapeworm cysts in whitefish (22). It 
has described effectively the distribution of insects in the field, in- 
cluding the beet leafhopper (8) and the wireworm (31), of ticks on 
individual sheep (12), of mites on rats (10) and of isopods under boards 
(9) (Table 6). Some failures in fitting can be ascribed to the inefficiency 
of the moment estimate of k. One of these is a count of Ribes on Mt. 
Spokane (14, 31), where Equation 3 gave k, = .134 and Equation 6 
gave k = .205. Even though the estimates did not differ significantly, 
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TABLE 7 
Distributions of quadrat counts with apparent bimodality (Obs. f) and their expected frequencies for 
the negative binomial (Bin @), Neyman type A (Ney @) and Thomas double Poisson (Thom @) dis- 
tributions, representing three species in a salt marsh, Salicornia stricta, Plantago maritima and Ameria 
maritima (4, 29), and a weed on arable land, Chenopodium album (25). 


Plants per Salicornia Plantago Ameria Chenopodium 
quadrat | Obs. Bin Ney | Obs. Bin Thom| Obs. Bin Ney Thom} Obs. Bin Thom 
0 4+ 3.3 10.7 12 7.6 11.0) 57 54.1 54.9 56.4 19 9.2 19.0 
1 3 6.4 4.0 8 11.3 6.7 6 16.2 7.6 5.6 5 13.5 5.0 
2 8 8.4 6.5 9 12.4 10.7 12 9.0 10.1 19.0 6 14.3 9.7 
3 13 9.4 7.8 13 «(12.0 11.2 5 5.8 9.0 9.6 9 13.0 10.6 
4 ll 9.6 8.1 6 10.8 10.8 5 3.9 6.5 6.8 &§ HO 86 
5 9 e232 78 8 9.3 10.0 5 28 43 4.3] 20 8.8 8.3 
6 8 8.4 7.6 11 7.8 8.8 7 2.0 2.8 28] 14 6.8 7.2 
7 10 73 Th ¥ 6.4 7.4 1 1.5 1.8 1.8 8 6.2 6.0 
8 3 6.5 6.4 8 5.1 6.0 4 3.8 4.9 
9 3 5.6 5.7 7 4.0 4.7 1 8 9 By 3 2.8 3.8 
10 8 4.7 49 3 3.2 3.6 1 6 4 4 2 2.0 3.0 
11 3 3.1 4.2 4 $3 2.7 +4.6 +7.9 

12 4 2.5 3.5 1 19 2.0 +2.2 +.3 +.5 
13 4 oi 22 1 1.4 1.4 
14 1.7 2.4 
15 3 13 19 
16 1.0 1.5 1 +4.3 +3.% 
17 t2 
18 1 6 9 
19 1 
20+ 3 +5.9 +2.9 
N, P(x?) | 98 48 100 100 95 <.001 <.001 


the x’ test for the agreement of the observed and expected frequencies 
from the two estimates gave P = .017 and P = .25 respectively. 

Solely as a method for summarizing a set of observations with two 
statistics, one of them the mean, the negative binomial should be of 
increasing interest to biologists. An adequate fit of this distribution 
to the data may serve to justify further statistical analysis such as 
sequential sampling (22) or a transformation for stabilizing the variance 
preparatory to the analysis of variance. But since several quite different 
models might possibly underlie data which conform to the negative 
binomial, one cannot use this agreement as the sole basis for justifying 
a particular model or conclusions based upon it. 

Comparisons with Other Distributions for Over-Dispersion. Although 
the negative binomial is the easiest to compute and the most widely 
applicable of the distributions for over-dispersion, several others have 
been proposed. Some of these have two or more modes, while the nega- 
tive binomial has only a single mode. In fitting the negative binomial 
to observed distributions, any “extra’’ modes are assumed to represent 
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random variation, as in fitting the negative binomial to the distributions 
of corn borer for treatments 1 and 2 in Table 3, of Tanytarsus and of 
Microcalanus in Table 6, and of quadrat counts in Table 7. In some 
cases, this assumption worked well, in others passably and in a few 
cases badly. Three two-parameter distributions have been described 
which may have two or more modes, the Neyman contagious type A, 
the Thomas double Poisson and the Polya. Each has been based upon a 
mathematical model of biological interest. 

The Neyman contagious type A distribution (21) assumes an initial 
population in which groups are dispersed uniformly, within the limits of 
the Poisson, over the area (or period) represented by the final counts. 
Individuals then move out from these random centers independently 
but at too slow a rate to equalize their dispersion over the entire area. 
As numerical examples, Neyman cites the distribution of corn borers 
following treatment 2 (Table 3) and Student’s haemocytometer counts 
of yeast (Table 2). In accord with theory, corn borer eggs are laid in 
masses and the larvae on hatching tend to migrate to neighboring corn 
plants. The Neyman expected frequencies, as fitted by Beall, are 
shown in Table 3. For treatment 2 they reproduced the observations 
better than the negative binomial but the reverse was true for the re- 
maining three treatments. In the Armeria counts of Table 7, the 
Neyman type A again reproduced the bimodality which was missed. by 
the negative binomial, but when fitted to unimodal distributions the 
negative binomial did as well or better (Tables 2, 5, 6, 7). Fracker and 
Brischle (14) fitted the Neyman type A curve to six series of Ribes 
counts. None of them approached agreement but five of the six agreed 
well with the negative binomial when computed by method 1 by Wadley 
(31) and the sixth agreed when fitted efficiently as noted above. Despite 
the advantage of a potentially multimodal curve, the range of distribu- 
tions fitted by the Neyman curve seems to be more restricted than with 
the negative binomial. 

The Thomas double Poisson distribution (29) is also potentially 
multimodel with peaks that are somewhat more sharply defined than in 
the Neyman type A. It assumes that the number of plants per quadrat 
can be broken down into two Poissons, one of the number of cluster 
centers and the other of the number of additional plants (after the first) 
in a cluster. Thus individual plants of Plantago maritima tended to be 
grouped, possibly because their inflorescences are short compact spikes 
which frequently fall to the ground with the seeds still in the capsules 
(4). Frequencies computed by moments for this and other series with 
the Thomas distribution are given in Tables 5 and 7. They closely 
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resemble the expectations for the Neyman type A and have similar 
advantages and limitations. } 

Under certain conditions, the Polya distribution (2) may have two 
modes with somewhat larger frequencies at z = 0 than in the negative 
binomial, which in other respects it closely parallels. Under certain 
conditions, the Polya distribution might represent the number of in- 
lividuals per quadrat in a growing population better than the negative 
»inomial (2). If, for example, the original progeniters were released all 
at the same time rather than continuously over ‘a period, the Polya 
distribution would be indicated, but if the individual rates of birth and 
death were constant and immigration occurred at a constant rate, the 
negative binomial would be more appropriate. So far as most biological 
distributions are concerned, the Polya seems to be a minor variant of 
the negative binomial. 

A quite different alternative is provided by Fisher’s logarithmic 
distribution (13). In studies of species, area and abundance, this has 
been applied effectively to the number of species (f) represented by 
x = 1, 2, 3, --- individuals, in the catches of insect light traps for 
example (33). If k in a negative binomial approaches zero and we omit 
the zero class, the observed frequencies can be considered a sample 
from a logarithmic distribution. Williams (33) has fitted the logarithmic 
distribution to Buxton’s data on the number of Hindu male prisoners 
in a south Indian jail with x = 1, 2, 3, --- lice per head, omitting the 
612 individuals (see Anscombe’s correction, 2) who were free of lice. 
The observed frequencies agreed far better with the logarithmic ex- 
pectations than with those computed from the negative binomial by 
method 1. In this instance, however, method 1 has an efficiency of less 
than 50 percent and method 2 and efficiency of nearly 98 percent. 
When recomputed with ¢ = 6.9357 + .5634 ahd k, = .144198 + .008195, 
the divergence between the expected and observed frequencies (Table 8) 
was well within the sampling error (x* = 31.74, n = 32). Neverthe 
less, the observed second and third moments were significantly larger 
than their expectations, with U = 243.3 + 38.9 and T = 26488 + 2048. 
Yet, k, was significantly larger than zero, its expected value for a 
logarithmic distribution. 

Reexamination of the data showed that four of the 1,073 prisoners 
had more than 200 head lice. When k, was recomputed without these 
individuals, the second and third moments were no longer discrepant, 
with U = 22.86 + 25.68 and T = 575 + 3297. This example indicates 
the disproportionate effect upon 7 and U of a very few large values of z. 
With these omissions, the expected frequencies agreed still: more closely 
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TABLE 8 


The observed distribution (f) of lice of all stages on the heads of Hindu male prisoners in Cannamore, 

South India, in 1937-39, and the frequencies computed with the negative binomial (Bin) from all of 

the data (®), and from all except four prisoners having more than 200 lice (¢’), compared with the 

frequencies computed for Fisher’s logarithmic distribution (Log @) by omitting the 612 prisoners 
without lice (33, 2). 


Lice No. of heads Lice No. of heads Lice No. of heads 
per per per 
head Obs. Binomial Log | head Obs. Bin Log head Obs. Bin Log 
0 612 612.0 612.0 ll 3 96 8.4 22-23 8 8.3 7.0 
1 106 86.5 90.5 107.2 i2 10 8.7 7.6 24-25 7 74 82 
2 50 48.5 50.8 52.8 13 8 8.0 6.9 26-27 10 6.6 5.6 
3 29 «33.9 35.5 34.7 14 6 7.4 6.3 28-29 6 6.0 5.0 
4 33° 26.1 27.3 25.6 15 3 6.8 5.8 30-32 2 7.9 6.7 
5 20 21.2 22.1 23.2 16 6 6.3 5.4 33-35 7 6.9 5.9 
6 14 17.8 18.5 16.6 17 7 5.9 5.0 36-38 9 6.0 5.2 
7 12 15.3 15.8 14.0 18 - 6.5 4.6 39-41 5 5.3 4.6 
8 18 13.4 13.8 12.1 19 7 5.1 4.3 42-45 5 6.1 5.3 
9 11 11.9 12.2 10.6 | 20 7 48 4.1 46-49 7 5.2 4.6 
10 11 10.6 10.9 9.4 21 3 4.5 3.8 50+ 27. «437.5 37.5 


with the observed values, as evidenced by the expectations for r = 0 
to 10 in Table 8. The agreement of the negative binomial expectations 
with the observed frequencies of mites on individual rats in Table 6 
and of ticks on sheep as given by Fisher (12) suggest a greater usefulness 
for the negative binomial than for the logarithmic distribution in 
studies on ectoparasites. 

Fitting a Single k to Several Negative Binomial Distributions. The 
analysis and interpretation of most biological data are facilitated by 
stability in the variance, so that only means need to be compared. A 
similar stability in the coefficient k would both increase the utility 
of the negative binomial and increase our confidence in its suitability 
for a given problem. The assumption of stability is essential in some 
cases as in the sequential sampling of whitefish for tapeworm cysts (22). 
By a simple expansion of the maximum likelihood solution, a combined 
k. can be computed from a series of distributions and their homogeneity 
in respect to k tested by x’. 

The calculation consists of computing the score z for each component 
distribution with the same trial values of k’ and adding the scores for 
each k’ over all component distributions. Trial values are selected until 
two sums of the scores, S(z), are obtained which closely bracket 0. By 
interpolation between them, the required estimate k, is that for which 
S(z) = 0. If these sums are designated as z, and z, from corresponding 
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trial values of kj and ki , the error variance of k, may be computed by 
equation 10. 


The homogeneity of the k’s in the component distributions depends 
upon the z; and z, in each individual series for kj and ki. The ratio 
23(ky — ks)/(@s — 24) (18) 


is computed from each component distribution and from the totals over 


TABLE 9 


Calculation of a combined &, by maximum likelihood from the four distributions of corn borer in 
Table 3. N 1In(10) = 276.310212. 


Calculation of score with Eq. 6 for -003 23? 
Treatment Term 
No. =1.5 ke ke’ ky’ 1.473] — 
1 S{Az/(k’ + 2)} 156.6745 164.1457 158.8287  158:6100 
N In(1 + 2/k’) 156.6387 162.7297 158.4110 158.2317 
25 .0358 1.4160 .3783 
2 S{Az/(k’ + 2)} 138.3464 145.2368 140.3318 140.1302 
+ 136.1976 141.8773 137.8480 137.6810 
2% 2.1488 3.3595 2.4838 2.4492 .5349 
3 S{Az/(k’ + 2)} 81.5615 86 .2613 82.9113 82.7742 
N In(1 + 2/k’) 82.5085 86 .6970 83 .7206 83.5979 
— .9470 — 4357 — .8093 — .8237 1365 
4 S{Az/(k" + 2)} 81.3606 85 .9803 82.6881 82.5532 
N + 2/k’) 83.5105 87.7329 84.7322 84.6083 
—2.1499 -1.7526 —2.0441 —2.0551 1.1305 
Total Ratios 1.8242 
S(2%) — .9123 2.5872 .0481 — .0001 
x? = 1.8241 


By interpolation k, = 1.47145, V(k-) = .003/(.0481 + .0513) = .03018 by Eq. 10. 


all distributions. The sum of the ratios computed from the g individual 
distributions for kj and kj is diminished by the ratio computed from 
the corresponding totals of z, and z,. The difference is x’ for testing 
the homogeneity of k with g — 1 degrees of freedom. In solving this 
expression, it is immaterial which boundary value, z; or z , is used, 
since the same x’ is obtained with either one. 

The procedure can be illustrated with the four distributions of 
corn borer in Table 3, representing four different treatments in the same 
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field. The calculation requires for each treatment the mean Z and the 
accumulated frequencies A, corresponding to those in the third column 
of Table 1. Starting with a trial value of k{ = 1.5, the calculations in 
Table 9 gave S(z) = —.9123, so that a smaller trial value, k; = 1.4, 
was selected next, giving S(z) = 2.5872. By interpolation between 
kj and kj for S(z) = 0, k; = 1.47 and from the resulting S(z) by further 
interpolation, kj = 1.473. Interpolation between k; and kj gave the 
maximum likelihood estimate k, . Testing the homogeneity of k over 
the four treatments required the ratios in the last column of Table 9, 
from which x” = 1.824 with three degrees of freedom. We conclude 
that the k’s for the four treatments did not differ significantly and 
could be represented by a single value of k, = 1.4715 + .1737. 

Summary. In analyzing biological counts for which the variance 
is significantly larger than the mean, the value of the negative binomial 
distribution is enhanced by a simplified maximum likelihood method 
for estimating the coefficient k. The calculation is illustrated in detail 
with a numerical example and compared with other estimates of the 
same parameter. The error variance of the statistics of the negative 
binomial and tests for its agreement with a set of observations are 
illustrated with the same example. Models underlying the negative 
binomial are reviewed with reference to observed distributions of both 
plants and animals. A comparison with other distributions for over- 
dispersion suggests that the negative binomial is the most widely 
adaptable and generally useful of those that have been proposed so far. 
Finally, the maximum likelihood estimation is extended to the calcula- 
tion of a single coefficient k from a series of similar distributions and 
the testing of their homogeneity by x’. 
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NoTE ON THE EFFICIENT FITTING OF THE 
NEGATIVE BINOMIAL 


R. A. FisHEer 
When it is desired to examine the representation of data having a, 


counts of x, for values of z from 0 upward, by means of the negative 
binomial distribution, in which the expectation of a, 


_wk+z-)! 
E(a.) = N —1)! (1 + 
is expressed in terms of two parameters p and k, it is well known that 
the equation of estimation based on the mean 


pk = 
is fully efficient. 


A second equation, with efficiency varying with the circumstances, 
may be taken from the second moment or variance 


pp+ 


or, among other ways, from the frequency of zeros 


(1+ p)* = N/a, 


In 1941, the author gave a number of rules (12, p. 185) for judging 
when the first of these is of adequate efficiency, and in 1950 (2), Anscombe 
has examined more fully the conditions of efficiency of both of these 
approaches. Many, however, will wish to use these methods only as a 
first step towards a fully efficient fitting, and the procedure for doing 

- this, whatever means are used for a first orientation, is perhaps worth 
setting out. 

Efficient scoring for k. From the primary expectation 


—1)! (1 + p)**? 


we have (using natural logarithms throughout) 


m, = E(a,) = n 
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whence 


S(za,) sca, 


(log my} er 


ap ~ pl ; p) 
If, therefore, we choose p such that 
p = 
the likelihood will be maximized for variation of p. 
The second equation for maximum likelihood is derived from 


(log m,) = F(k — 1) — 1) ~ log (1 + p) 
where F(z) stands for 


log (2!) 
and 
F@) — Fe — 1) = 1fz. 


The efficient score for k is therefore 


s{a. (og m)} 


1 1 1 £ 


Fe In calculating the numerical value of this score for any trial value 
*s ; k, it is convenient first to add up the series of observations from the 
highest value backward, so that A, is the number of observations 
exceeding 2, i.e. 


A, = + + ad inf. 


Then the convenient expression for the score is 


(A+) — 1 tog (1 + 2) 


Trial values are then not difficult to evaluate. The value of k having 
maximum likelihood is k, that for which the score vanishes; the corre- 
sponding value for p is #/k, and the amount of information about k is, 
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as usual, the rate at which the score is decreasing as it passes the zero. 
Hence, the sampling variance and the standard deviation of the estimate 
may be calculated (p. 182). 
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THE FITTING OF MULTI-HIT SURVIVAL CURVES 
A. W. 
Oak Ridge National Laboratory 


1. Introduction. 


Because of the rapid growth of research in the field of atomic energy 
for both military and civilian uses, biologists are placing more and more 
emphasis on the study of the effects of radiation on living organisms. 
The use of microorganisms in radiation studies has increased steadily 
since they are in general easier and cheaper to work with than higher 
forms of life and since the results of such research may be used to define 
areas of investigation with more expensive material. For the most part 
data from experiments with microorganisms are survival proportions of 
cultures exposed to different experimental conditions, frequently varying 
amounts or intensities of radiation. For this reason the interpretation 
of survival curves plays a very important role in the field of modern 
radiobiology. 

Among research workers who have realized this fact are Atwood and 
Norman (1949) who have discussed rather thoroughly the theoretical 
approach to survival curves from several different points of view. 
Graphical methods of estimating parameters have Been suggested by 
these and other authors, but the problem of estimating sampling errors 
seems to have been overlooked. The purpose of this paper is to present 
some statistical methods for obtaining parameter estimates and their 
standard errors. Although the analysis of a single-hit curve is simple 
and straightforward, it will be presented first for completeness. 


2. The single-hit curve. 


If a population consists of organisms each having one sensitive unit 
inactivation of which causes loss of viability, the probability that the 
unit is not hit when exposed to a dose zx of radiation is assumed to be 
e ** where k is some positive constant independent of dose. It follows, 
therefore, that the expected proportion of a population surviving a dose x 
is equal to e“*._ If several cultures of the same population are exposed 
to different doses x; (¢ = 1, --- , p) and the observed proportions surviv- 
ing are S; , the equation* 


(1) y; = log S; = —ka, 


*Throughout the paper all logarithms are taken to the base e. 
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where ¢,; represents the amount by which 4; differs from its expected 


value, provides a simple method for estimating k. If the ¢; satisfy 
certain assumptions (see section 4), an estimate of k is given by 


k = D2, 


and the variance of & is estimated by* 


sj = >, 2’, 


where 
1 ~ 
+k 
TABLE 1 
FITTING OF A SINGLE HIT CURVE, EQ. (1) 
{Data from Lea, Haines and Coulson (1936)] 

Proportion Log S Dose in minutes 
surviving of exposure 
(S) (y) (x) 

.90 —0.105 
—0.174 2.5 
.80 —0.223 4.9 
.69 —0.371 9.8 
.58 —0.545 14.7 
.47 —0.755 27.7 
—1.715 55.6 
-06 —2.813 85.7 


> = 11.9499, = 11,548, zy = —370.707 
k = —(—370.707)/11,548 = .0320992 
s’ = 4[11.9499 + .0320992(—370.707)] 
= .007271 
si = .007271/11,548 = .630 x 10° 
= .0008 


*Indices on summation symbols and subscripts on variables have been omitted wherever the 
meaning is unambiguous. 
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In Table 1 the method is illustrated with data from Lea, Haines and 
Coulson (1936). Spores of B. mesentericus were exposed to the beta rays 
of a radon source for varying lengths of time. Different numbers of 
loops were exposed at each dose in an attempt to keep the errors uniform. 
The estimate & = .0321 when expressed in reciprocal seconds is 


5.35 X 10°* which agrees well with the value 5.25 < 10‘ given by the 
authors. 


3. The multi-hit curve. 


Frequently populations consist of organisms each having n sensitive 
units all of which must be inactivated before the organism will lose its 
viability. If the hits are independent and if k is the same for each unit, 
the probability that all n units are inactivated is (1 — e**)".’ Accord- 
ingly, the expected proportion of a population surviving a dose z is 

One method for fitting survival curves based on this model has been 
used widely. For large doses 


so that in an experiment resulting in observed proportions surviving S,; 
over a range of doses z; (¢ = 1, --- , p), the equation 


(3) y; = log S; = logn — kz, + €; 


leads to a method for estimating n and k. In practice the data are plotted 
on semi-logarithmic paper and only those points at large doses which 
appear to lie nearly on a straight line are used in fitting (3). If the 


assumptions about ¢; are correct, estimates of k and n are given by the 
familiar formulas 


a 


k 


-Le- - 9/Le - 2 
= 9 + ke 
where £ and g are arithmetic means. The variances of & and log 7 are 
si = 8°/ (x — 
= 8° (x — 4)” 


where 
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TABLE 2 
FITTING OF A MULTI-HIT CURVE, EQ. (3) 
[Data from Pomper (1952)] 


Proportion X-ray dose in 104 
surviving Log S roentgens 
(S) (y) (x) 
. 86 
64 2 
072 —2.631 6 
020 —3.912 8 
0056 —5.185 10 
0020 —6.215 12 
00045 —7.706 14 
> (x — = 70.00 = = 9.00 
(y — = 26.158162 = —4.535 


lor) 


(x — Hy — = —42.75 
k = —(—42.75/70) = .610714 
log i = —4.535 + .610714(9.00) = .9614 


nh = 2.62 
s’ = }[26.158162 + (.610714)(—42.75)] 
= .01254 
= (.01254)(556)/6(70) = .0166 
Siogs = «13 
si = (.01254)/70 = .000179 
se = .0134 


This method is illustrated in Table 2 with some hitherto unpublished 
data of Dr. S. Pomper of Oak Ridge National Laboratory. These sur- 
vival data are from diploid cultures of Saccharomyces cerevisiae exposed 
to varying doses of X-radiation. A preliminary plot indicated that doses 
equal to or greater than 40,000 roentgens would satisfy approximation 
(2). Only these points have been used in the calculations. 
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Although at first glance this technique seems satisfactory and appeal- 
ing because of the simplicity of the calculations, closer scrutiny reveals 
that n and k are actually very poorly estimated. Since the organism is a 
diploid we know that n = 2. But if this information were not known, 
the 95% confidence interval for ’, which is (1.8, 3.7) would tell us only 
that it is a diploid or a triploid, possibly even a tetraploid. Furthermore, 
as the number of sensitive units per organism increases, the confidence 
intervals computed from this method become progressively worse. 
Actually the poor quality of the estimation is only a reflection of the 
well known fact that confidence intervals for a regression line are ex- 
tremely wide outside the range of the observed data. In other words the 
gain xu simplicity achieved by virtue of approximation (2) is more than 
offset by an increase in the errors of estimate. The same argument 
without modification cannot be applied to the estimation of k by this 
method, but as it turns out in this example, k is also poorly estimated. 
Clearly, some approach which permits the use of all points on the curve 
must be used. 

As an alternative to (3) we may write 


(4) u, = log (1 — =n log (1 


which invoives no approximations. The estimation procedure requires 
the minimization of the quantity 


V= lu; — n log (1 — 


for variations in n and k. The equations for # and & so obtained are 
non-linear and in general must be handled by iterative methods. How- 
ever, for most experiments, a short-cut method is available which ordi- 
narily will give reliable results. Stepwise the method may be described 
as follows: 

1. Select a trial value of k, say k, , which may be obtained from: a 
plot of the higher dose points. . 

2. Find the value of n, say m) , which minimizes V fork = k,. It is 


mo = w/ 
where 
v = log (1 — e*”). 


3. Compute Vain = >. u? — no >, w. 
4. Repeat the process for two or three more trial values of k, chosen 
so as to bracket the absolute minimum of V4. . 
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5. Plot each V,,;, against its corresponding trial value of k and inier- 
polate to obtain the value of k which makes V an absolute minimum. 

6. This final value of k and the corresponding n computed from the 
formula in step 2 are the desired estimates & and “i. 

In selecting an initial trial value of k, it should be remembered that 
the estimate of k obtained from the semi-logarithmic plot of the higher 
dose points will usually be too large. In Pomper’s data the estimate 
was 0.61, so the initial trial value chosen was 0.57. The computations 


TABLE 3 
FITTING OF A MULTI-HIT CURVE, EQ. (4) 
(Data from Pomper (1952)] 


Proportion log (1 — S) log (1 — e7-5?) 

killed 

1-S u .57z 1 — v 

14 —1.9661 .57 .4345 — .8336 
36 —1.0217 1.14 .6802 — .3854 
79 —0.2357 2.28 8977 — .1079 
928 —0.0747 3.42 9673 — .0333 
980 —0.0202 4.56 9895 — .010§ 
9944 —0.0056 5.70 9966 — .0034 
9980 —0.0020 6.84 9989 —.0011 
99955 —0.0004 7.98 9997 — .0003 


= 4.971023, > ov? = 856210 >> w = 2.060758 
Mo = 2.060758/.856210 = 2.406837 
Vinin = 4.971023 — 2.406837(2.060758) = .01111 


RESULTS OF SUBSEQUENT TRIALS 


k 
55 .00931 
50 .00938 


for steps 2 and 3 are shown in Table3. Note that the quantity, >> u’, has 
to be computed only once, since it does not involve k. Complete calcu- 
lations are given only for k = 0.57. The results for two more trial 
values are given at the bottom of the table. The points are plotted in 
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10° 


FIGURE L GRAPHICAL ESTIMATION OF K 


Figure 1, and the curve drawn free hand through the points indicates an 
absolute minimum at about k = 0.525. Using this as the final & we find 
> v? = 1.0065, >> w = 2.2349 and A = 2.2205. These estimates were 
found to satisfy the true estimation equations to three significant figures. 

The computation of standard errors for estimates A and & obtained 
from model (4) is complicated by the fact that the estimates are not 
linear functions of the u; . In this case the only practical method of 
obtaining error estimates is to compute the asymptotic variance-covari- 
ance matrix. The formulas, derivation of which is outlined in section 4, 
are 


s*[B/(AB — C’)] 
s'[A/(AB — C’)] 
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where 
A=) 
** 
) 
TABLE 4 
CALCULATIONS FOR STANDARD ERRORS OF nz AND k, EQ. (4) 
(4) = (3) + 
(1) (2) (3) fl — QI (5) 
z ee a@—e*) v 
1 .5916 .5916 1.4483 — .8954 
2 . 3499 . 6999 1.0766 — .4307 
4 1225 4898 .5582 — .1306 
6 0428 . 2571 . 2686 — .0438 
8 0150 . 1200 .1218 — .0151 
10 0052 0525 .0528 — .0053 
12 0018 0221 .0221 — .0018 
14 0006 0090 .0090 — .0006 
= > = 1.0065 
B = f? (4)’ = (2.2205)*(3.6586) 
= 18.0390 
C =A > (4)(5) = (2.2205)(—1.8473) 
= —4,1020 


? 4[4.971023 — 2.220471(2.234919)] 
001408 
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The calculation of A, B and C is carried out in Table 4. The columns are 
set up in an orderly fashion and B and C can be obtained from sums of 
squares or cross-products of the appropriate columns. Other tabular 
arrangements may be found more suitable for some computers. Finally 
the standard errors are found to be s; = .14, s; = .033. Although the 
number of degrees of freedom to be associated with s” is not well defined, 
it is suggested that for purposes of computing confidence intervals, 
(p — 2) be used in conjunction with Student’s ‘‘t’” in the usual manner. 
When this is done the approximate 95% confidence intervals for ” and & 
are (1.9, 2.6) and (.44, .60), respectively. 

The computational labor in fitting model (4) is greater than in fitting 
model (3), but it is not excessive. In any particular case, the additional 
work needed to obtain more accurate estimates must be balanced against 
the consequences of possibly incorrect or misleading conclusions which 
might result from the simpler analysis. In the example just discussed, 
the simpler analysis might have led to the erroneous conclusion that the 
original culture was a triploid and would have overestimated k. Whether 
such errors can be tolerated will depend on the individual experimenter 
and the significance of a particular experiment. 

More general models have been discussed by Atwood and Norman 
(1949). The population may consist of organisms each with a different 
number of sensitive units. In Neurospora crassa, for example, popula- 
tions of macroconidia are found in which cells have different numbers of 
nuclei. In this case the method of fitting just described estimates the 
average number of nuclei per cell in the population. A further generality 
is introduced if the sensitivities of the n units in an organism are not the 
same, i.e., if each unit has a different k. The fitting of curves based on 
this model would necessitate the use of an iterative solution of estimation 
equations. Because of the length and complexity of calculations re- 
quired for such methods, their use is limited. 


4. Some comments on the theory. 


The statistical methods employed in the previous sections depend 
first of all on the assumption that the e; are independent normally dis- 
tributed random variables with zero expectations and a variance which 
is the same for all7. The truth is that they are independent and do have 
zero expectations, but they are not normally distributed and do not have 
uniform variances. In most cases the survival proportions are obtained 
from the ratio of two bacterial counts which suggests that the «; might 
behave somewhat like the ratio of two Poisson variates. Conceivably 
one could find a transformation which would render the variances nearly 
uniform and perhaps induce approximate normality, but it would likely 
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have two disadvantages. It would probably do poorly at very high or 
very low survivals, the latter having occurred in the example just dis- 
cussed, and it would make the introduction of a specific model relating 
survival and dose even more difficult than it already is. The problem 
of heterogeneous variances is often handled quite adequately by the 
experimenter simply by increasing the number of plates or loops at 
doses which are expected to have e, with high variances. One never 
achieves perfect homoscedasticity by such procedures but they usually 
suffice for most practical purposes. As for the lack of normality, the 
logarithm of the ratio of two Poisson variates in which the denominator 
is determined much more accurately than the numerator might be 
expected to have a moderately symmetrical distribution and therefore 
not depart so far from normality as to cause any serious difficulties. 

The standard errors of 7” and & given in the last section are obtained 
from the asymptotic variance-covariance matrix which is derived in the 
usual manner. If the «; satisfy the assumptions in the previous para- 
graph, the probability of the sample is 


and the likelihood is L = log P(S). Then the asymptotic variance- 
covariance matrix is the inverse of the matrix 


a*L aL 
on? on dk A 
aL aL 
an dk ak? 


where A, B and C are defined as in section 3. In practice, o’, n and k 
must be replaced by their estimates, s”, i and &. This of course means 
that the standard error estimates are only approximate, and a further 
ambiguity is introduced when an attempt is made to estimate confidence 
intervals for n and k. One reasonably safe procedure is to associate 
(p — 2) degrees of freedom with s’ and to compute confidenec intervals 
using Student’s “¢”. It is difficult to determine the accuracy of such 
methods. A rough check, however, seems to indicate that they are 
conservative. If k is assumed to be known and equal to 0.525, the vari- 
ance of 7, which is then properly estimated, is 0.001396. The variance 
‘ of # computed from the asymptotic variance-covariance matrix is over 
ten times as large as this, a fact which would seem to support the conten- 
tion that the approximation is on the conservative side, 
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5. Summary. 


The fitting of theoretical survival curves to data obtained from radia- 
tion experiments with microorganisms is discussed from a statistical 
point of view. Formulas are given for obtaining estimates of parameters 
and standard errors. The methods are illustrated with two sets of 
experimental data. 

The author gratefully acknowledges many helpful discussions with 
Dr. K. C. Atwood and is indebted to Dr. S. Pomper for permission to 
use his hitherto unpublished data. 
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POPULATION GROWTH OF THE SEXES 
Leo A. GoopMan! 


University of Chicago 
1, INTRODUCTION 


Many authors (see, e.g., references 1-19) have noted that the sex 
distribution at birth is not equal (more males than females are born) and 
that the ability of the sexes to withstand the forces of mortality is not 
the same (the death rates at most ages are higher for males than for 
females). This phenomenon is not confined to man alone, but it also 
occurs, though not universally, among a number of other species. The 
literature contains many discussions of the importance and implications 
of these two facts. 

At a recent meeting of the Royal Statistical Society several speakers 
pointed out that the possibility of variations in the relative numbers of 
the two sexes has been too long neglected in population mathematics. 
D. G. Kendall [22] has discussed some of the characteristic difficulties 
of this problem and suggested some approximations which we shall 
extend. Some work on this problem, from a deterministic standp6int, 
has been carried out (see, e.g., references 1, 13-19). Kendall has men- 
tioned that the problem of expressing the modes of population growth 
for the two sexes in stochastic form would be a very difficult one. 
References 20-26 deal with this problem when one sex is considered and 
their bibliographies give references to much of what has been written in 
the field. 

J. Yerushalmy, in a very interesting paper [1], describes the age-sex 
composition of the population resulting from natality and mortality 
conditions. If we are interested in the ultimate stationary population 
we may determine its over-all sex ratio from Yerushalmy’s analysis. 
In this paper we shall consider the problem of determining the over-all 
sex ratio for populations which need not be stationary and also the prob- 
lem of studying the population growth of each sex. 

We shall consider various mathematical models of population growth. 
In order to make possible an analytic treatment of the subject, these 
models necessarily will be oversimplifications of reality. 


iThis paper was prepared in tion with r h supported by the Office of Naval Research. 
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2. DETERMINISTIC MUDELS 


Extending Kendali’s [22, p. 247] approach we have the following 
model: If M(t) and F(t) are the number of males and females respec- 
tively at time ¢, then these quantities satisfy the differential equations 


aM = + A(M, F) 
dF 
at = —bF + A’(M, F) 


where a is the intrinsic male death rate ner male per unit of time, b is the 
intrinsic female death rate per female per unit of time, A(M,F) and 
A’(M,F) are functions of M and F representing the contributions from 
the male birth rate and female birth rate, respectively. (Kendall dealt 
with the case where a = b and A = A’.) 

Consider the case where A(M,F) = uF and A’(M,F) = vF; ie., 
where the birth rates depend on the female population size F (females 
are marriage dominant, see [17], [18]). We then have 


dM 
—aM + uF 


—bF + oF = — 
The solution of these equations is 


uA 
@—b+a)° 
F(t) 


where A and B are determined by the population composition at t = 0. 
We have that the sex ratio is 


M(t) _ Be — b +4) v—b+a 


M(t) (o—b)t + 


and the ultimate sex ratio is S = S(~) = u/[v + a — bj, when 
v+a-—b>0. We note that M and F tend jointly to infinity when 
v — b > 0, and to zero when v — b < 0. When the female death rate 
equals the female birth rate, v — b = 0, M and F tend to nonzero limits 
and the sex ratio approaches S = u/a the ratio between the male birth 
rate and the male death rate. 

Now consider the case where A(M,F) = uM and A’(M,F) = vM; 
ie., where the birth rates depend on the male population (males are 
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marriage dominant). By applying the methods of the ees case 
we may determine M(t), F(t), S(t), and we find 


u—-a+t+b 
v 


S = 
when u—a+b6>0. Also, M and F behave qualitatively in the same 
way as before. When u — a = 0, M and F tend to nonzero limits, and 
S = b/v. 

Let us consider the case where 


and 


A'(M, F) = 


i.e., where the contributions from the birth rates depend on the total 
population size M + F (neither males nor females are marriage domi- 
nant). We then have 


+8) = b + 42 M = oF + kM, 
where 
v u v 
f=5-4 g9=5- 4, c= > k = 


The solution of these equations is 


F(t) = + 


where h = +V/(f — g)* + 4ck, and A, B are determined by the popu- 
lation composition at ¢ = 0. Hence the sex ratio is 
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and the ultimate sex ratio is 


S = S(@) = 2c/[g-f +h] 


We could have determined that S was one of two values directly by 
noting that 


asjat = ) /a = [ Fam 


| + cF? — gFM — | 
=(f-—gS+e—kSs. 
The solution of this equation for 2S/dt = 0 is 


—2k —2k 
g—-f—h_ 2c = ) 


We note that M and F tend jointly to infinity when f + g +h > 0, 
and to zero when f +g + h <0. Whenf +g +h =0, M and F tend 
to nonzero limits and S = u/—2f = u/(2a — u) (2b — v)/v. The 
critical relation between the constants is . 


(f+ 9) =(f — g)’ + 4ke 
fg = ke 


(2a — u)(2b — v) = w 
ajv — b) + btu — a) = 


which is a generalization of Kendall’s critical relation [22, p. 248]. 

Let us now consider the case where A(M,F) = uV MF and 
A’(M,F) = vV MP, i.e., where the contributions from the birth rates 
depend on the geometric mean of M and F. We then have 

dM 


“at = —aM +uVMF 


a 
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Writing VM = X, VF = Y, 


2X aX —aX’? + uXY 
dt 
ax 
IX 

and 

dt 
adY 


where —a/2 = f, —b/2 = g, u/2 = c,v/2 = k. Hence using the results 
for the preceding model, we can compute X(t), Y(t), X°(t) = M(d), 
Y*(t) = F(é), and S(t) directly. We find that 


S = S(m) = {2/[g — f + 


= | 2 + 
Also M and F behave qualitatively in the same way as before, the critical 


relation between the constants being 
fg = ke, or ab=w, 


in which case 
S = [u/—2f]* = = [b/r)’. 

It is interesting to note that in all the models we have considered 
heretofore, when the population size tends to a nonzero limit, the sex 
ratio is what one would expect of a stationary situation; i.e., S = bu/av, 
which equals u/a when v — b = 0, and b/v when u — a = 0, and (b/v)’ 
when ab = wy (in the preceding model). 

Consider now a model which discriminates between married and 
unmarried persons. Let M, F, and N denote at time ¢ the number of 
unmarried males, unmarried females, and married couples, respectively. 
Then a natural generalization of the preceding models is governed by the 
equations 


dF 
—bF + vN + mN — K(M,F) 
dN 


a + K(M,F) 
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where a is the unmarried male death rate per unmarried male per unit of 
time, b is the unmarried female death rate per unmarried female per unit 
of time, u is the male birth rate per married couple per unit of time, 
v is the female birth rate per married couple per unit of time, m is the 
married male death rate per married couple per unit of time, n is the 
married female death rate per married couple per unit of time, and 
K(M,F) is the marriage rate per unit of time (the age distribution is, of 
course, being neglected throughout). 

In the case where, K(M,F) = cF (female marriage dominance), the 
equations reduce to 


= —aM + dN oF 
F 

= + kN 

dN 


wherred =u+n,g = —b-—c,k =v+m,f = —(m+n). The 
solution of these equations is 


+ Bid(f — g — h)/k — Qe?" /[f + g —h + 2a] 
+ De 


where h = +V(f — g)? + 4ck, and A, B, D are determined by the 
population composition at ¢ = 0. 
Hence the sex ratio of unmarried people 
M(t) 


and the overall sex ratio 


MO) + NY 
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may be evaluated. We find that 
S’(o) = [2d + f — g — 2allg — f +h) 


and 


We also note that M and F behave qualitatively in the same way as 
before, the critical relation between the constants being 


fg=k, or (m+n\(b+c) = + mpc, 


in which case, 


S’(o) = —(d + fhc/af = + f)g/ak 
= (u — m)(b + c)/av + m), 


and 


=(u— mece+ m+n). 


We see that the sex ratio differs from what one obtains in a simple 
stationary situation. 

A similar analysis could be carried out for models which discriminate 
between married and unmarried persons and where the males are mar- 
riage dominant (K(M,F) = cM) or where neither sex is marriage domi- 
nant (e.g., K(M,F) = cV MF or K(M,F) = c(M + F)/2). 


3. SIMPLE STOCHASTIC MODELS 


Let us consider a slight generalization of a stochastic model described 
by Moran [25]. If B(é) is the total number of males, and G(¢) the total 
number of females born since time ¢t = 0, and if B(t) = c, and G(t) = g 
at time ¢, the probabilities at time ¢ + dt are given by 


Pr {B(t + dt) =c + 1} = udt + o(dt) 
Pr {B(t + dt) = c} = 1 — udt + o(di) 
Pr + dt) = 9 + 1} =v dt + o(dt) 
Pr {G(t + dt) = g} = 1 —vdt + oft), 
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where B and G are independent. The distribution of length of life may 
be of any general form and we suppose that the average length of life for 
the males is Ly = 1/a, and the average length of life for the females is 
Ly = 1/b. Then it is not too difficult to see that if M(t) is the expected 
number of males alive at time ¢, and F(t) is the expected number of 
females alive at time f, then the sex ratio will ultimately be 


im “MO _ 
lim F(t) ~ 


This result is the same as that of the corresponding deterministic model 


S(o) = = bu/av. 


aM 
—aM 
dF 


Now consider the following process: In order that the total population 
size remain constant, when a person dies there is a “replacement” made 
(a birth takes place) which will be either a male or female with chances | 
u/(u.+ v) and v/(u + v), respectively. The chance that a given male 
will die in a unit of time is a, and b is the chance that a given female will 
die. We might now consider a given male and study its life span and 
also the life span of its “‘replacement,” and the life span of the “‘replace- 
ment’s replacement,” etc. The probabilities at the tth unit of time are 
given by 


Pr.{M} = Pr.i{M}(l — a) + 


where Pr,{M} is the probability that the “replacement” at the ¢th unit 
of time for a given male is a male, Pr,{F} is the probability that the 

“replacement” at the ith unit of time for a given male is a female, 
Pr,{M} + Pr,{F} = 1. The value of Pr,{M@} may be computed 
explicitly as a function of a, b, u,v, and ¢. As ¢ approaches infinity, we 
find that 


and, hence, 
Pre{M} _ bu 
Pr.{F} av" 


We consider only the case where Pr,{M} = 1, since the more general 
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case may be analyzed using only the special case. Since the total popula- 
tion size is a constant N, the expected number of males alive in the tth 
unit of time is N Pr,{M}, and the expected number of females alive in 


the tth unit of time is N Pr,{F}. Therefore, the overall sex ratio for the 
population will ultimately be 


S(o) = bu/av. 


The model we have just considered is a special case in the general 
theory of renewal. An excellent exposition of this topic appears in [26]. 
It may be shown that more general assumptions concerning the distribu- 
tion of length of life still lead to the same results when ¢t approaches 
infinity. 

The preceding simple stochastic models have been presented more as 
an illustration of the problem than as a solution to it. We shall now 
consider more complicated stochastic models. 


4. STOCHASTIC MODELS OF POPULATION GROWTH 


We shall now consider a stochastic process which is a probabilistic 
analogue of the deterministic model describing a population where the 
females are marriage dominant; i.e., where the birth rates depend on the 
female population size." Although the analysis which follows pertains to 
populations where the females are marriage dominant, the results ob- 
tained may be used in an obvious manner to analyze populations where 
the males are marriage dominant. 

The population under consideration behaves in accordance with the 
following rules: 

(a) the subpopulations generated by two co-existing individuals de- 
velop in complete independence of one another; 

(b) a female alive at time ¢ has a chance u di + o(dt) of giving birth 
to a male and a chance v dt + o(dt) of giving birth to a female during the 
following time interval of length dt; 

(c) a female at time ¢ has a chance b dt + o(dt) of dying and a 
male alive at time ¢t has a chance a di + o(dt) of dying in the following 
time interval of length df. 

Let the random variables M(t) and N(t) denote the number of males 
and females, respectively, alive at time t. We see that the female popu- 
lation follows the rules of a simple birth-and-death process (cf. [22], 
p. 236), the mean female population size being 


Nit er! 


1The author wishes to express his indebtedness to David G. Kendall of Magdalen College, Oxford 
and Princeton University, and Charles Stein of the University of Chicago for their assistance in the 
analysis of this stochastic process. 
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while 


when V(0) = 1. Since the subpopulations generated by two coexisting 
individuals develop in complete independence of one another, we need 
only multiply N(t) and Var{N(¢t)} by N(O) in the more general case 
where there are N(0) > 1 females alive at the initial time ¢ = 0. There 
will be ‘‘almost certain” extinction unless the mean female population 
size is increasing (v > b). When v = b ana the mean female population 
size is constant, we still find that there will be ‘‘almost certain’’ extinc- 
tion. 

The behavior of the male population is somewhat more difficult to 
analyze since male births depend on the female population. The method 
of analysis will be described in the Appendix. We find that the mean 
male population size is 


u (o-b)t _ 
MO 
while, 
var (M(O} = — b) a 
atv — — b+ 2a) (v— b+ a)’ 


when M(0) = 0 and N(O) = 1. In the more general case where 
N(0) > 1 females are alive at the initial time ¢ = 0, the formulas M (t) 
and Var{M(t)} are multiplied by N(0), since the subpopulations generate 
independently, 

In the more general case where there are M(0) > 9 males at the 
initial time ¢ = 0, these M(0) males follow the rules of a simple death 
process; i.e., the chance that M,(t) from among the original M(0) males 
will be alive at time ¢ is 


where p(t) = 1 — g(t) = e**. Hence, 
= 
and 


Var {M@,(t)} = [1 — 
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We then add 1/,(¢) and Var{ 47,(¢)} to M(é) and Var{ A/(t)}, respectively, 
to obtain the mean and variance of the total number of males when there 
are M(0) males and N(0) females at time ¢ = 0. These additions have, 
of course, no influence on the form of the solution when t >. 

We find that the covariance between M(t) and N(t) is 


Cov = E{{M(t) — — 


u(y + b) u (v + b) 
~ @~—b+a) \w— a aw—b° 


when M(0) = 0 and N(0O) = 1. In the more general case where 
N(0) > 1, M(O) > 0, then the covariance is multiplied by N(0). 

In the special case where the constants are equal (a = b = u =v = d), 
we have 


=NO, Mw = NOL + MOec™, 
Var {N(t)} = 
Var {M(t)} = — 2+ 3e™ —e™] + MO)e™ [1 — e™*], 
Cov () = N(0)[2At — 2 + 2e™'], 


and the correlation coefficient p(t) = Cov(t)/V Var {M(t)} Var {N(t)} 
between the number of males and the number of females in the popula- 
tion is seen to approach one as ¢ becomes large. 

The methods which we have used to analyze the stochastic process 
representing a marriage dominant (female or male) population, can be 
extended in order to analyze a stochastic process where the contributions 
from the birth rates depend on the total population size (neither male 
nor female is marriage dominant). That is, the population under con- 
sideration behaves in accordance with the following rules: 

(a) the subpopulation generated by two coexisting individuals de- 
velops in complete independence of one another; 

(b) an individual alive at time ¢ has a chance u dt/2 + o(dt) of repro- 
ducing a male and a chance v dt/2 + o(dt) of reproducing a female during 
the following time interval of length dt; 

(c) a female alive at time ¢t has a chance b dt + o(dt) of dying and a 
male alive at time ¢ has a chance a di + o(dt) of dying in the following 
time interval of length dt. 

Using the second method described in the Appendix we may obtain 
the moments and cross moments of M(t) and N(¢), the number of males 
and females, respectively, alive at time ¢. The means, variances, and the 
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covariance of M(t) and N(t) satisfy a system of five differential equations 
with constant coefficients which may be solved to determine these 
moments explicitly. 

Let us consider the special case where the constants are all equal 
(a = b = u=v =). Then the probability P,,,,(¢) that at time ¢ the 
population contains m males and n females may be determined explicitly 
when the initial conditions are Py,,(0) = P,,.(0) = 1/2. We find that 


form + n > 1, and 


Po.o(t) = dt/(1 + At). Whence we see that there will be “almost 
certain” extinction. The probability P,,(é) that at time ¢ the population 
contains m males, and the probability P;(¢) that at time ¢ the population 
contains n females may also be determined explicitly. We have 


Pa) = Pd = 2/0 


form > 1 and 


P,(t) = Pot) = (1 + Ad/(2 + dd). 


We find that M(t) = N(t) = 1/2, Var{M(t)} = Var{N(t)} = (2at+ 1)/4, 
Cov(t) = (2At — 1)/4, and the correlation coefficient p(t) = (2\¢ — 1)/ 
(2¢ + 1). 


5. APPENDIX 


In studying the stochastic process representing a population where 
the females are marriage dominant, the means, variances, and covari- 
ances of the male and female population sizes were given. The following 
method which was used to obtain the first and second moments could 
also be used to obtain higher moments: 

All moments for the female population size may be obtained directly 
from its distribution which is a geometric series with a modified zero 
term (cf. [22] p. 237) or from its moment generating function. All 
moments for the male population size may be obtained from its moment 
generating function ¢(z,t) = E{z““’}. We find that ¢(z,t) satisfies the 
differential equation 
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and that ¢(z,0) = z, when M(0) = Oand N(0) = 1. By differentiating 
(1) once and setting z = 1, we obtain a linear differential equation for the 
mean M(t). By differentiating (1) several times and setting z = 1, we 
obtain linear differential equations for the higher factorial moments uf 
M(t). 

The method just described gives the moments of M(t) but does not 
give the cross moments of M(t) and N(t); e.g., the covariance. Arother 
method might be used to obtain the moments and cross moments of M(t) 
and N(t) which serves as an independent check on the preceding method. 
The two methods might be used simultaneously to simplify computation. 
Let P,,,,(t) be the probability that at time ¢ the population contains m 
males and n females. Then the infinite system of functions P,,.,(t), 
m= 0,1,2,---,n = 0,1, 2, --- , satisfy a basic system of ordinary 
differential equations (cf. [23], Chap. 17). By multiplying each differ- 
ential equation by an appropriate factor and then adding the entire 
system of differential equations, we obtain ordinary differential equa- 
tions for the moments; e.g., if we multiply the differential equation 
containing the term [0P,,,,,(é)]/d¢ by m and then add the entire system of 
differential equations, we will obtain an equation containing the term 


OP 
ot 


which is in fact [aM (t)]/ot. We then have a differential equation in 
M(t) which may be solved directly. 
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ESTIMATION OF VARIANCE AND COVARIANCE 
COMPONENTS* 


C. R. HENDERSON 


Cornell University 
INTRODUCTION 


The theory of variance component analysis has been discussed 
recently by Crump (1946, 1951) and by Eisenhart (1947). These 
papers and, indeed, most of the published works on estimating variance 
components deal with the one-way classification, with ‘‘nested” classi- 
fications, and with factorial classifications having equal subclass 
numbers. Also most papers on this subject are concerned with what 
Eisenhart (1947) has called Model II; that is, all elements of the linear 
model save » are regarded as random variables. In the above cases, 
estimation of variance components is usually accomplished by com- 
puting the mean squares in the standard analysis of variance, equating 
these mean squares to their expectations, and solving for the unknown 
variances. These techniques are described in many statistical text- 
books. 

Unfortunately, research workers in some of those fields in which 
much use is made of variance component estimates are unable to obtain 
data which have the above described characteristics. This is par- 
ticularly true in those fields in which survey data must be used or 
where, even in a well-planned experiment, the subclasses are of quite 
unequal size due, for example, to differences in litter numbers. Also, 


*Presented at North Carolina Summer Statistics Conference June 24, 1952. 
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Model II is sometimes not appropriate. Instead the data more ap 
propriately correspond to what Eisenhart called the Mixed Mode: 
For example, the data may represent several different years, and tht 
vear effects should be regarded as fixed rather than as random variables 

It is the purpose of this paper to describe some methods for esti- 
mating variance components in the non-orthogonal case and to illus 
trate the methods with a small sample of butterfat records made b; 
cows resulting from an crtificial breeding program. The three method: 
described are: 


1. Compute sums of squares as in the standard analysis of variance 
of corresponding ortnogonal dat. Equate these sums of square: 
to their expectations ontained under the assumption of Model I 
and solve ior the unknown variances. 

2. Obtain least squares estimates of fixed effects, ‘‘correct’’ the dat. 
according to these estimetes of the fixed effects, and then usin 
the corrected deta in place of the original data, proceed as ir 
Method } 

3. Commute mean squares by a conventional least squares analysis 
of non-orthogonal data (method of fitting constants, weightec 
squares of means, e.g.). quate these mean squares to thei 
expectations and solve for the unknown variances. 


These three methods henceforth called Method 1, Method 2, an 
Method 3 vary greatly in computational labor. Method 1 is the 
simplest. Method 2 in many cases is only slightly more difficult 
Method 3 is usually much the most laborious. Method 1, however 
leads to biased estimates if certain elements of the model are fixed o 
if some of them are correlated. Estimates obtained by Method 2 ar 
free of the first of these biases, but not of the second. Method 3 yield 
unbiased estimates, but the computations required may be prohibitive 
The relative sizes of the sampling variances of estimates obtained b: 
these three methods are not known. 


DESCRIPTION AND ILLUSTRATION OF METHOLS OF ESTIMATION 


The Data 


In New York State most artificial breeding of dairy cows is ac 
complished with semen supplied by the New York Artificial Breeders 
Cooperative, Inc. ‘This cooperative organization has approximately 
60 bulls in service. The operations of the organization are conductec 
in such a manner that it is largely a matter of chance to which bull’ 
semen a particular cow is bred. This fact as well as the large numbe 
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of daughters sired by each bull make the production records of these 
daughters particularly suitable for studying the genetic differences 
among bulls and for estimating the magnitudes of other sources of 
variation in milk production records. Good estimates of these variances 
are needed in designing efficient testing and selection programs. 

The difficulties in estimating the pertinent variance components 
are typical of those faced by research workers in animal breeding and 
in other fields as well. The difficulties in the present example are due 
to the following causes: 


1. Several years’ data are involved and time trends are known to be 
important. 

2. The two major classifications of the data are sire and herd. The 
number of sires exceeds 100 and the number of different herds 
exceeds 2000. 

3. The number of observations per herd-sire subelass varies; the 
majority being 0. 


We have estimated from these data the pertinent variances. Both 


Method 1 and Method 2 have been employed and have yielded estimates 
A small sample of records is presented in this 


| tgs essentially the same. 
. paper and the three methods of estimation are illustrated. 

Table 1 shows the number of first lactation butterfat records in 
each of the year X herd X sire subclasses And also the sum of the 


records for each of these subclasses. 
TABLE 1 
Year 
“Herd Sire Total 
Sf 1 2 3 4 
1 1 3-1414 | 2- 981 5-2395 
: 4 1 2 4-1766 | 2- 862 6-2628 
Ae 1 3 5-1609 | 5-1609 
; 2 1 1- 404 | 3-1270 4-1674 
2 2 5-2109 5-2109 
2 3 4-1563 | 2-740 | 6-2303 
3 1 3-1705 | - 3-1705 
3 2 4-2310 | 2-1134 6-3444 
4 1 3-1113 5-1951 8-3064 
4 3 3-1291 | 6-2457 | 9-3748 
Total 7-2931 | 21-9983 | 16-6959 | 13-4806 | 57-24679 
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The Linear Model 


Let y;;. denote the record made in the h-th year by the k-th daughter 
of the j-th sire in the i-th herd. Suppose that the appropriate linear 
model representing these observations is 


= hi +s: + (hs) + 


jz=l,---,yr Total number of filled subclasses = s 


u is common to all observations. a, is common to ail observations in 
the h-th year, h; to all observations in the 7-th herd, and s; to all records 
made by daughters of the j-th sire; (hs),; is peculiar to ail records made 
by the daughters of the j-th sire in the 7-th herd. Peculiar to each record 
is a random element ¢,,,. which is assumed to bave mean zero and vari- 
ance o.. The assumptions made concerning the other elements of the 
model are describe? for each estimat:cn method. 


Method 


Method 1 can be used only if it is assumed that, except for y, all 
elements of the model are uncorrelated variables with means zero and 
variances 04,0, 04s, OF This is, of course, the Eisenhart Model II. 

The following quantities are computed: 


2 
reELE 
Maree Ney 
H8= 


Dots in the subscripts denote summation. For example, 


Next the expectations of the above quantities are computed. Under 
the assumptions of Model II, the coefficients of u? and the variances in 
these expectations are as shown in Table 2. 


*Thig method was firgt suggested to me by Dr. S, Lee Crump, 
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TABLE 2 

i N N N N N N 
A N N Ky K2 Ks Pp 
HS N K, N N N s 
H N Ks; N' Ks Ke q 
S N K; Ks N Ks r 
CF N | Ks Kio Ku Ky 1 


N, p, q, T, 8 in the above table were defined in the statement of the 
linear model. K,; , K., --+ , Ky. must be computed as follows: 


> ni: 


h Np. i 
Misi 
ry h 
K.= Ky = n',./N 


K, = > Ky, = 


If the data were orthogonal, the sums of squares in the analysis of 
variance would be 


Among Years = A — CF 

Among Herds = H — CF 

Among Sires = S — CF 

Herds X Sires = HS — H — S+ CF 
Error = T — A— HS+CF 


If these same quantities are computed in spite of the non-ortho- 
gonality and are equated to their expectations, unbiased estimates of the 
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ae 
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variances can be obtained by solving the resulting equations. The 
necessary expectations are derived from Table 2. To illustrate, E 
(Among Years) = E(A — CF) = F(A) — E(CP). 

Computation of the K’s is facilitated by constructing from Table 
1 the following two-way tables of subclass numbers (Tables 3, 4, 5). 


TABLE 3 
| Year 
Herd |— — Total 
1 2 3 4 
| = 
» | 3 6 2 5 16 
2 1 3 9 2 15 
3 (| 0 7 2 0 9 
4 | 3 5 3 6 17 
Total 7 21 16 13 57 
TAPLE 4 
Year 
Sire Total 
| 1 2 3 4 
1 7 13 0 0 20 
2 0 s 9 0 17 
3 0 | 0 7 13 20 
Total 7 21 16 13 57 
TABLE 5 
Sire 
Herd Total 
1 2 3 
1 5 6 5 16 
2 4 5 6 15 
3 3 6 0 9 
4 8 0 9 17 
Total 20 17 20 57 


Also certain totals are computed from Table 1. 
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Year Herd Sire 
1. 2931 1. 6632 1. 8838 
2. 9983 2. 6086 2. 8181 
3. 6959 3. 5149 3. 7660 
4. 4806 4. 6812 
Total 24,679 Total 24,679 Total 24,679 


Using the above totals and the totals in Table 1, 
2931° 4806° 


2 2 
Hs = 5... 4 _ 10,970,369 
_ 6632? 6812” 
_ 8838’ 8181* | 7660° _ 
2 
CF = 2h = 10,685,141 


The expectations of these quantities are presented in Table 6. 
The computations of these entries proceed as follows: 


2 2 2 2 2 2 
from Table 3, +6 


- 1B = 19.51 
From Table 4, K, = 21 4 16 39.22 


2 2 2 2 2 2 
From Table 1, K; = + = 15.10 


2 2 2 2 
From Table 1, k,=2 42 4...48 = 37.35 


2 2 2 2 


37 + 5° + 37+ 6 


t+ 17 


= 21.49 


2 2 2 2 2 
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_77+13? , 13? 


From Table 4, K; = “90 17 + 30.33 
2 2 2 2 
+ 6” + 9? _ 
= 18.51 
2 2 2 2 
From Table3, K, = 
2 2 2 2 
From Table 3, Ky = = 14.93 
2 2 2 
From Table 4, K,, = 19.11 
2 2 2 
Frum Table l, K,. = = 6.19 
TABLE 6 
sz sy. 57 57. 57. 57 | 11,124,007 
A 57 57. 1.51 39.22 15.10 4 10,776,451 
HS 57 37.35 | 57. 57. 57. 10 10 ,970 ,369 
H 57 21.49 57. 24.04 24.04 4 10 ,893 ,666 
S 57 30.33 18.51 57. 18.51 3 10,776,278 
CF 57 16.05 14.93 19.11 6.19 1 10 ,685 ,141 


The equations to be solved are presented in Table 7. The first 
equation reads: 40.95 + 4.58 + 20.11 + 8.91 + 30% = 
91,310. 


TABLE 7 
A 40.95} 4.58| 20.11] 8.91] 3 | 91,310 
H —CF | 5.44] 42.07] 4.93] 17.85] 3 | 208,525 
S-—CF 14.28| 3.58] 37.89] 12.32] 2 | 91,137 
HS—-H-S+CF | 1.58} —3.58| 20.64] 4 | —14,434 
T-—A-—HS+CF |-21.30 | —4.58| -20.11| -8.91| 44 | 62,328 
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The solution to these equations is 2 = 763, 0, = 4531, 07 = 1587, 
Ore = —164, o2 = 2950. If a}, is set equal to 0, the solution is «2 = 756, 
o, = 4468, 0 = 1542, of = 2952. These estimates, of course, have no 
practical value for p, g, 7, and s are much too small for accurate estima- 
tion of the corresponding variances. The illustration of their computa- 
tion does, however, show that even with many different classes the 
computations are relatively simple. We have successfully adapted 
most of these computations to International Business Machines opera- 
tions. 

A difficulty with Method 1 is that it may be inappropriate to regard 
the year effects as random variables. If these effects actually are fixed, 
the estimates of oj , 7, , and oj, are biased. The estimate of of may or 
may not be biased depending on how it is estimated. This estimate is 
biased if obtained from the equations of Table 7. If, however, o? had 
been estimated from 


2 
A hit 
the within year X herd xX sire subclass sum of squares, the estimate 
would be unbiased regardless of the assumptions concerning the a, . 

It might be well at this point to state briefly a convenient procedure 
for finding the expected values of quantities like H, S, etc. Substitute 
for the y’s their corresponding linear models, and then remembering 
the assumptions concerning the elements of the model proceed to write 
out the expectations. For example, 


i 


+ ;h? > n?;;8; + s(hs) 


+ do + cross products all having zero expectation]/n.;; 
a 
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+ 


2 


Nw+ 


+ N(o, + + + 80% 


Method 2 


The bias in estimating variance components due to the assumption 
that fixed elements of the model are random variables can be eliminated 
by using Method 2. At the same time the relative simplicity of Method 
1 can be retained. Method 2 involves estimating the fixed effects by 
least squares, correcting the data in accordance with these estimates, 
and then applying Method 1 to the “‘corrected’”’ data. 

This method was used by Hazel and Terrill (1945) on data which 
were orthogonal except for the fixed effects. Their estimates were 
biased for they assumed for computational purposes that, except for 
fixed effects, the expectations of sums of squares of corrected data are 
the same as the expectations of the corresponding sums of squares of 
the uncorrected data. Method 2 enables one to appraise this bias and 
to correct for it. 

Before we apply this method to our example, let us consider the 
general case. Suppose the linear model is 


(1) Ye = >, ditia + a=1,---,N 

i=l 
The z’s are known. The e’s are uncorrelated with mean = 0 and 
variance = 

If the b’s are all fixed, the least squares equations for estimating 
them are as shown in (2). The b’s are, in fact, not all fixed in the 
variance components estimation problem, but they can be estimated 
by least squares as a matter of expediency. 


N 
(2) C\.b; = Y, Ci = Lilia 
> N 
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It is sometimes necessary to impose one or more linear restrictions 
on the estimates in order to obtain a solution to equations (2). 
Now suppose that b, , --- , b, are fixed and also that for all i = 1, 


(3) — b)? = Kye? 
It is not true that all least squares estimates have this property. For 
example, in our butterfat production example described in Method 1 
E(u — Kyo! 
Instead 


= + ot + + Kot 


Method 2 applies only to correcting data by least squares estimates 
for which (3) applies. It is not difficult to determine which 6’s qualify. 

Now the data are corrected as follows (in practice only certain linear 
functions of the observations need to be corrected): 


Suppose that for 7,7 = s+ 1,---,r< pallC,; = Owhenz 7. Let 
Z, = 


Then compute (5). Note that 


i=1 


(5) 


t=s+l1 


It is found that, except for o2 , the expectation of (5) is the same 
as the expectation of (6) with b, , --- , b, assumed = 0. 


(6) 


The coefficient of o% in the expectation of (5) is increased over that 
of (6) by the quantity. 


(7) ae 


t=1 j=1 
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C*' are elements of the matrix inverse to the matrix of C,; (i, j = 1. 
P), and 


= CC 
t=s-1 


Computation of (7) is simple if s is small and if least syuares equations 
(2) can be rewritten as (8). This can be done in many cases. 


(8) >> C.:b; = Y; i, 
D + Cid: = Y, t=s+il,---,p 
Noie in these equations that for all z, 7 =s+1,---,p, C;; = 0 
when 7 # 7. When equations (8) prevail, C*’ and b; (i,7 = 1, --- , 8) 


can be computed from equations (9). 
(9) = Yi =1,---,9) 
In equations (9) 


t=s+1 
The least squares estimates of b, , ‘-- , b, are the solution to equa- 
tions (9), and C*’ (i, 7 = 1, --- , 8) are the elements of the matrix 


inverse to the matrix of coefficients in (9). 

Let us illustrate Method 2 with the data of Table 1. We shall now 
assume that the a’s are fixed. First the least squares estimates of the 
a’s are computed. This is done most simply by estimating them jointly 
with d;; = u + h; + 8; + (hs);; . Thus the equations are reduced to 
the form of equations (8). Looking at Table 1 these equations are 


74, + + do + 3da = 2931 
and similarly for the other “a” equations 
34, + 24, + 5d,, = 2395 


and similarly for the other “d’’ equations 
Then by (9) these equations reduce to the ones shown in Table 10. 
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TABLE 10 
dy dz ds as 
3.825 | 3.825 0. 0 — 73.500 
—3.825 6.492 —2.667 0. 101.500 
0 —2.667 6.000 —3.333 41.333 
0 0. —3.333 3.333 — 69.333 


One restriction must be imposed before a solution is obtainable. A 
convenient one is d, = 0. Then the solution is @, = 12.08, @, = 31.30, 
4; = 20.80, 4d, = 0. 

Inverting the matrix of coefficients of Table 10 with fourth row and 
column deleted, the C*’ pertaining to the a’s are obtained. These are 
presented in Table 11. 


TABLE 11 
a az as 
936438 675000 300000 
675000 .675000 300000 
300000 . 300000 300000 


Now the data can be corrected for the 4’s. For example, the cor- 
rected total for the subclass pertaining to herd 1 X sire 1 is 2395 — 
3(12.08) — 2(31.30) = 2296.16. The corrected subclass and class 
totals are shown in Table 12. 


TABLE 12 
Sire 
Herd - Total 
1 3 3 
1 2296.16 2461.20 1609 .00 6366 .36 
1568 .02 2005 .00 2219.80 5792.82 
3 1611.10 3277 .20 4888 .30 
4 2871.26 3685 . 60 6556 . 86 
Total 8346.54 7743.40 7514.40 23 ,604.34 


4 
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Using the totals of Table 12 in conjunction with the subclass and 
class numbers of Tables 1 and 4, the corrected sums of squares are 
computed. Thus, H’ = (6366.36)’/16 + --- + (6556.86)?/17. These 
quantities are: 


HS’ = 10,016,791 S’ = 9,833,620 


Il 


H’ 9,954,295 CF’ = 9,774,822 


Next the amounts by which the coefficients of o% are increased in the 
corrected as compared to the uncorrected sums of squares are needed. 
Using (7), the P,; pertaining to HS’ are computed. Looking at Table 1, 


Pu= yg = 3.175 


5 4 8 
Table 13 presents the complete set of P’s for HS’. 


Py = 


TABLE 13 
a2 as 
3.175 3.825 0. 
3.825 14.508 2.667 
0. 2.667 10.000 


The sum of products of corresponding entries in Table 11 and Table 
13 is 


.936438(3.175) + --- + .300000(10.000) = 22.53 


Therefore, the coefficient of o2 in E(HS’) = 10 + 22.53 = 32.53. 
The P,,; for H’ can be computed by reference to Table 3 thus 


Py =tetisti7= 1.159 


— 3) , 18) , 36) _ 
Pa = tis t+ 7 2-7 


Table 14 presents the complete set of P;; for H’, 
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TABLE 14 
a, az a3 
1.159 2.207 1.504 
2.207 9.765 4.988 
1.504 4.988 6.624 


Multiplying these values by those of Table 11, the addition to o? 
in E(H’) = 16.54. 
The P;; for S’ are shown in Table 15. Referring to Table 4, 


7(13 
= 20’ = etc. 
TABLE 15 
a az a3 
2.450 4.550 0. 
4.550 12.215 4.235 
0. 4.235 7.215 


Then the addition to o: in E(S’) = 2.450 (.936438) + --- = 21.39. 
Finally the P;; for CF’ are 


7(21 
P,, = 57 P,, = ie etc. as shown in Table 16. 
TABLE 16 
a a, a3 
.860 2.579 1.965 
2.579 7.737 5.895 
1.965 5.895 4.491 


Multiplying and summing corresponding entries of Tables 11 and 16, 
the addition to the coefficient of o% in E(CF’) = 15.57. 
Table 17 shows the corrected sums of squares and their expectations, 
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TABLE 17 
Hs’ 57. 57. .| 57. 57. 32.53 | 10,016,791 
H’ 57. 57. 24.04 | 24.04 | 20.54 9,954,295 
s’ 57 18.51 | 57. 18.51 | 24.39 9 ,833 ,620 
CF’ 57 14.93 | 19.11 6.19 | 16.57 9,774 , 822 
The equations to be solved are presented in Table 18 
TABLE 18 
| | 
| | 
H’' —CF'| 42.07 4.93 17.85 | 3.97 | 179 ,473 
S'’—CF'| 3.58 37.89 12.32 7.82 | 58,798 
HS' — H’ — S8'+CF'| -3.58 | —4.93 20.64 | 4.17 3,698 


The estimate of o2 can be obtained readily from the residua! sun of 
squares after estimating the a’s and d’s, that is from 


— Reduction (a, , 4,,) 
A ry i k 

i k 


Reduction (a, , d;;) = 12.08 (—73.500) + 41.30 (101.500) + 20.80 
(41.333) + HS = 3149 + 10,970,369 = 10,973,518 


The residual sum of squares is therefore 11,124,007 — 10,973,518 = 
150,489, with expectation 44 o? . Consequently o? = 150,489/44 = 
3,420. Substituting o¢ = 3,420 in the equations of Table 18 and solving, 
the estimates of the variances are.o, = 3792, o: = 409, o;, = 243. It 
is not surprising that these estimates are different from the estimates 
obtained by Method 1. The sampling variances must be extremely 
large in both cases. 

Adapting Method 2 to a model with covariates is easy to accomplish. 
In some problems this would simplify the computations. For example, 
if many years were involved, the number of fixed elements in the model 
could be reduced by fitting a quadratic or cubic to years instead of 
estimating individual yearly effects as was done in this example. If 
the years exhibit no trend, the simplest procedure is to regard q’s as 
random variables and then to apply Method 1, 
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Method 3 


When it is computationally feasible, Method 3 is the most satis- 
factory of the three methods for estimating variance components. For 
one thing it gets around the difficulty of fixed elements in the model. 
For another, it yields unbiased estimates even though certain elements 
of the model are correlated. The manner in which interference by these 
correlations is eliminated is described subsequently. 

Unfortunately Method 3 is not likely to be computationally feasible 
in the non-orthogonal case unless the number of different classes is 
small or unless the design incorporates planned non-orthogonality and 
consequently the mean squares of the analysis of variance can be com- 
puted without solving least squares equations. In these two cases the 
expectations of the mean squares are easy to compute. For example, 
the analysis of the balanced incomplete block design is simple and so 
is the writing of the expectations of the mean squares. The basic facts 
needed for employing Method 3 are stated below. 

Let the linear model describing 7, , the a-th observation be 


(10) Yo = + Ca 

The z’s are known. The e’s have mean zero, are uncorrelated, and 
have common variance o7 . For the present we shall not specify whici: 
b’s are fixed and which distributed. 

Now if 6.4, , --: , 6, are set = 0, the least squares estimates of 
b, , -:: , b, are the solution to equations (11). 


etc. 


In equation (11) 


N 
Ci, = ja 


N 


a=1 


Y; 


The reduction in sum of squares due to 6, , --- , 6, is 


(12) 


i=1 
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But since 


i=) 
where C*’ are elements of the matrix inverse to the C;; matrix (i,j = 
i. q); 


i=1 
Using (13), the expectation of R(b, , --- , b,) is easy to write, but not 
necessarily easy to compute. 


(14) E[R(b, = DS CUE(Y.Y)) 


j=1 


Use will be made of the fact that (14) can be written 


j=l f=q+l j=l 


t=orl j=qtl 


where 


he = + when u 0, 


@ @ 
In most variance components problems £b,b; = 0 (i ¥ 7). Thus only 
the \,;,; need to be computed. q’ refers to the number of independent 
equations in (11). 
It is easy to verify that the expectation of the uncorrected total 
sum of squares is 


N 
a=1 j=1 
Now it becomes clear why the residual mean square has expectation 
o, regardless of the assumptions concerning the b’s. Making use of 
(15) it is seen that 


.j=1 
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Therefore, the expectation of the residual sum of squares is 


i=1 j=1 t=1 j=1 


= (N — p’)o, 


Suppose that 6,., , --: , 6, are independently distributed with 
means = 0 and common variance oc’. This variance can be estimated 
by equating R(b, , --- , b,) — R(b, , --- , b,) to its expectation. The 
expectation of this difference is seen by reference to (15) and (17) to be 


(19) > — + (p’ 


t=q+l 


(Ci; Ado + (p’ = 
Then using the estimate of o? arising from (18) an unbiased estimate 
of o” can be obtained by equating R(b, , --- , b,) — R(b, , --- , b,) to 
(19). It will be noted that the assumptions made concerning }, , --- , b, 
are of no consequence. 
Now we shall illustrate Method 3 with our data of Table 1. If one 
were to carry out the usual tests of hypotheses by least squares, the 
following sums of squares would be computed. 


Among Years = R(years, herd X sire subclasses) — R(herd X sire 
subclasses) 
Among Herds = R(years, herds, sires) — R(years, sires) 


Among Sires = R(years, herds, sires) — R(years, herds) 


Herds X Sires = R(years, herd X sire subclasses) — R(years, 
herds, sires) 


Residual = — R(years, herd X sire subclasses) 
i k 


The last four of these quantities can also be used to estimate o, , 
o. ,o,,, and o:. If years were regarded as random variables, the first 
would be used to estimate o2 . Our present assumption is that the year 
effects are fixed, however. 

According to (15) we need not be concerned with yu and the a’s in 
the expectations since the expectation of each of the above reductions 


4 
"hy 
au 
4h 
th 
af 
| 


VARIANCE AND COVARIANCE COMPONENTS 245 


contains the following quantity, 


Nw? +2 m..wa, + 
h h 


This expression vanishes in the sums of squares, which are differences 
between two reductions. 

Aside from y and a, terms, the expectations of the pertinent reduc- 
tions are those shown in Table 19. 


TABLE 19 
| 
Che 

N N N N 
R(years, herd X sire 

subclasses) N N p+s-—1 
R(years, herds, sires) N N Ki p+qt+r-—2 
R(years, herds) N pt+q-1 
R(years, sires) K, N Ks p+r-1 


The entries other than the K’s in Table 19 are derived from (15) 
and (16). The K’s are computed by (14). The expectations of the 
sums of squares of the analysis of variance are presented in Table 20. 


TALLE 20 
| 
Among Herds N — K, 0 Ki -— Ks |q-1 
Among Sires 0 N — K, | Ki-—K; |r-1 
Herds X Sires 0 0 N-Kijs-—-q-rt+l 
Residual 0 0 0 N-p-sil 


The computations of the various reductions proceed as follows in 
our example. R(years, herd X sire subclasses) = 10,973,518 as was 
shown in the description of Method 2. To obtain R(years, herds, sires) 
the equations of Table 21 need to be solved. In these equations u is 
estimated jointly with each a, , while h, and s, are set = 0. These 
three or some other set of three restrictions on the estimates are 
necessary. 
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tion is 


solution is 


herds, sires) 


TABLE 21 
a2 a3 a h he hs 31 82 
a; 7 0 0 0 3 1 0 7 0 2931 
Az 0 21 0 0 6 3 7 13 8 9983 
a; 0 0 16 0 2 9 2 0 9 6959 
a4 0 0 0 13 5 2 Q 0 0 4806 
hy 3 6 2 5 16 0 0 5 6 6632 
A hy 1 3 9 2 0 15 0 4 5 6086 
’ hs 0 7 2 0 0 0 9 3 6 5149 
7 81 7 13 0 0 5 4 3 20 0 8838 
: S82 Q 8 9 0 6 5 6 0 17 8181 
ve The solution is 
} a, = 414.77 h, = 6.13 §, = 2.48 
a, = 419.83 h, = —8.15 8 = 15.09 
a3, = 412.39 hs = 143.05 
a, = 368.59 


414.77(2931) + --- + 15.09(8181) 


= 10,921,107 


a, = 414.76 
a, = 422.99 
a; = 418.32 
a, = 366.30 


In order to compute R(years, herds) the s, and s, rows and columns 
of Table 21 are deleted and the resulting equations solved. The solu- 


h, = 11.35 
= —6.35 
= 150.16 


R(years, herds) = 414.76(2931) + --- + 150.16(5149) 


= 10,919,698 


a, = 425.46 
a, = 461.13 


The reduction due to years and sires requires solution of the equa- 
tions of Table 21 with the h, , h. , hs rows and columns deleted. The 


a3 = 407.72 
a, = 369.69 
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= —6.75 = 48.38 
R(years, sires) = 425.46(2931) + + 48.38(8181) 
= 10,800,679 


The computations of the K’s in Table 19 require inversions of 
certain matrices. To obtain K, , the inverse of the matrix of coefficients 
in Tuble 21 is needed. This inverse matrix is presented in Table 22. 
‘Lhe entries to the left of the dizgonal are omitted since the matrix is 
symmetric. 


TABLE 22 


| a | | he | hs | 81 S2 
|——-—| 

a 64207) 41685). 15893] 03469) — 07845 — 02812) — .05580| — .46226| — 22447 
| 42381). 16668 .03183 — 06608, — .04169' — .69296| — .38257| — .21929 
| 19176! .02719 — 03647) — .08558 -- G+440)— . 13108] — . 12625 
| | 1092. — .06501| — 04759) — .04686| — .00004| 02410 
m | 14349) .06382) 09075) .00834) — .05104 
| 14976, .07769| — .02062| — .02907 
| .23943) .00581| — .07214 
.46163| 25050 


| . 28088 


Now we need the coefficients of o;%, in the expectations of squares 
and products of the right members of the equations of Table 21. These 
computations are facilitated by setting up Table 23. 


TABLE 23 
Herd sire,subclasses 


Right 
members| 11 12 13 21 22 23 31 32 41 43 
Ms. 3 0 0 1 0 0 0 0 3 0 
Y2... 2 4 0 3 0 0 3 4 5 0 
Ys... 0 2 0 0 5 4 0 2 0 3 
Ya... 0 0 5 0 0 2 0 0 0 6 
Was. 5 6 5 0 0 0 0 0 0 0 
Y.2.. 0 0 0 4 5 6 0 0 0 0 
s.. 0 0 0 0 0 0 3 6 0 0 
Yi. 5 0 0 4 0 0 3 0 8 0 
Y..2, 0 6 0 0 5 0 0 6 0 0 
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The coefficients of oj, in the squares and products of right members 
are the squares or products of appropriate rows in Table 23. For 
example, the coefficient of oj, in E y/... is 3? + 17 + 3? = 19. That 
in E(y;...y2...) is 3(2) + 1(3) + 3(5) = 24. The complete set of co- 
efficients is presented in Table 24, with entries to the left of the diagonal 
omitted due to the symmetry of the matrix. 


TABLE 24 

a a, a3 Qs h h, hs 81 
a 19 24 0 0 15 4 0 43 0 
a2 79 16 0 34 12 33 71 48 
a3 58 26 12 49 12 0 49 
a 65 25 12 0 0 0 
hy 86 0 0 25 36 
ha 77 0 16 % 
hs 45 36 
8) 114 0 
S82 97 


Multiplying and summing corresponding entries of Tabiecs 22 and 24, 


K, = 19(.64297) + 2(24)(.41685) + --- + 97(.28088) 


= 38.29 


Calculation of K, requires the inverse of the matrix of coefficients vi 
Table 21 with s, and s, columns and rows deleted. This inverse is 


presented in Table 25. 
TABLE 25 
a a2 a3 a, h, h, hs 
a .17524 | .03588 | .03770 | .03027 | —.06049 | —.04552 | — .03629 
az .10581 | .05361 | .03375 | —.06365 | —.06022 | —.09421 
a3 . 13358 | .03635 | —.05523 | —.09823 | —.07138 
a .10523 | —.05576 | —.04461 | —.03433 
h, . 12204 .05734 .06178 
ha . 14663 .06867 
hs . 20025 


= 
| 4 
| | 
| 
| 
| 


VARIANCE AND COVARIANCE COMPONENTS 249 


Also needed in the computation of K, are the coefficients of of in the 
expectations of squares and products of right members of the least 
squares equations. Table 26 facilitates this computation. 


TABLE 26 
Sires 
Right members 

1 2 3 
"1... 7 0 0 
13 8 9 
Ys... 0 9 
Ys... 0 0 13 
Y.1. | 5 6 5 
Y.2.. | 4 5 6 
Y.3.. 3 6 0 


The coefficient of of in E(yj...) is 77 = 49, in E(y;...y2...) is 7(13) 
= 91, etc. The complete set is shown in Table 27. 


TABLE 27 
ay a, a3 h hs 
a 49 91 0 0 35 28 21 
a, 233 72 0 113 92 87 
as 130 91 89 87 54 
a 169 65 78 0 
h 86 80 51 
ha 77 42 
hs 45 


Now K, = 49 (.17524) + 2(91) (.03588) + --- + 45 (.20025) 
= 42.29 
K; is obtained from Table 24 and 25, thus 
Ks; = 19 (.17524) + 2 (24) (.03588) + --- + 45 (.20025) 
= 31.71. 


In a similar manner K, is found to be 22.57 and K, to be 22.57 (the 
equality of K, and K, is only a coincidence.) 

Table 28 presents the pertinent reductions and their expectations 
(excluding » and a, terms) 
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TABLE 28 
oh oe 

57 57 57 57 11,124,007 
R(years, herd X sire 

subclasses) 57 57 57 13 10,973,518 
R(years, herds, sires) 57 57 38.29 9 10,921,107 
R(years, herds) 57 42.29 31.71 7 10,919 ,698 
R(years, sires) 22.57 57 22.57 6 10,800 ,679 


Then the equations to be solved are those of Table 29. 


TABLE 29 
Che 
Among Herds 34.43 0 15.72 3 120 ,428 
Among Sires 0 14.71 6.58 2 1,409 
Herds X Sires 0 0 18.71 4 52,411 
Residual 0 0 0 44 150 ,489 
The solution to these equations is o, = 2255, 0 = —1295, of, = 


2070, and o? = 3420. 


ESTIMATION OF COMPONENTS OF COVARIANCE 


The same general principles described in Methods 1, 2, and 3 for 
estimating variances can be employed to estimate covariances. To 
illustrate, suppose an observation is made on each of the progeny re- 
sulting from single crosses among inbred lines. If y;;, is the observation 
on the k-th progeny of the i-th male line by the j-th female line cross, 
a model which might reasonably be assumed is 


Yiu = BEG + Git M; + 855 + sin 
where g; is the general combining ability of the 7-th line, g; is the general 
combining ability of the j-th line, m; is the maternal ability (exclusive 
of the genes transmitted to the progeny) of the j-th line, s,; is peculiar 
to crosses 7 X j and of j X 7, and e;;, isa random error. Suppose further 


that the elements of the model fit Eisenhart’s Model II except that 
E(gim,) = - 
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The problem is to estimate the variances and o,,,. If Method 3 is 
used, unbiased estimates of the variances are obtained. If Method 1 is 
used, the estimates are biased due to the presence of o,,,. If the least 
squares estimates of g; and m; are computed, an unbiased estimate of 
gm can be derived from >_; g; m;, the expectation of which is (p —1)om 
+>, k, 0. , where p is the number of lines and k, o¢ is the covariance 
between g, and m, , assuming that g; and m, are fixed. 

A more frequently occurring type of covariance estimation problem 
in animal breeding arises in connection with estimation of genetic and 
phenotypic correlations. Observations are taken on two or more traits 
in some population. The following linear models characterize such 
observations on two traits. 


+ 


t=1 


(20) Yo 


> +e 


b, and 0b! are fixed fori = 1, --- , q 
b, and b{ are random variables fori = q+ 1,-:-,p 


So far as the random variables are concerned, 


Eb, = Ebi = 0 = 
E(e.)” = o All other covariances = 0 


Now in place of least squares reductions like 
we substitute 
or Y.Y./C;; , 


where Y,, , refers to a right member of the least squares equations for 
the second trait. 

Then the expectations of these reductions are exactly the same as 
described for estimation of variance components except that o,;, , is 
substituted for 0; and o,,- , is substituted for o¢. Therefore, any of the 
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three methods for estimating variances can be used equally well for 
estimating coviriances when (20) is the model. 
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QUERIES 


GerorcE W. SNEDEcOoR, Editor 


QUERY: We have been using the technique given in Snedecor’s 
100 “Statistical Methods”, secticn 11:10 and have come across some 

results which lave raised a question of correct procedure. The 
data are the weighi. grams) of foetal membranes in swire oi the 25th. 
day of gestation. 


Litter Size 
Mating <9 >9 
ke 
cc 2 12.700 + 9.525 
PC 4 8.575 : 5.100 
cP 2 10.600 3 7.200 
PP 3 8.233 3 8.400 
| 

Source of Degrees of Sum of Mean 

Variation Freedom Squares Square 

Litter Size 1 11.06 

Mating 3 24.82 

Error 14 108.23 7.73 


Our calculations yield the following results: 
> WD? = 37.03 
(>> WD)?/>> W = 24.28 


Interaction 12.75 
253 
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11.06 


Litter Size Sum of Squares, unadjusted 


Litter Size Sum of Squares, adjusted 24.28 


| 
Since interaction may be assumed negligible, we proceeded as follows: 


A. Correction for Disproportion = —13.22 

| | Is it possible to have a negative correction? If so, is it correct to 
} use algebraic subtraction in this next step? 

i Mating Sum of Squares, unadjusted = 24.82 

| Correction for Disproportion = —13.22 


3 Difference for Adjusted Sum of Squares = 38.04 


For the 2 X 2 table it is possible to have a negative 
a ANSWER: _ correction to be subtracted algebraically as you have done. 
‘ That is, the adjusted sum of squares may be larger than 
the unadjusted. 

For the R X 2 table, the method of adjustment given in table 11.22 
of my text is incorrect. My attention was called to this recently by 
Dr. J. O. Irwin. This adjusting device applies to the 2 X 2 table but 
not to the larger one. 

A method for calculating the adjusted sum of squares for the rows 
is illustrated below. It is similar to the scheme for calculating the 
interaction in table 11.22. The distinction is that the sums, S = Z, + Z2, 
are used instead of the differences. 


| 


W = WS 
ki + ky 
1.3333 22.225 29.6326 
0.8000 13.675 10.9400 
1.2000 17.800 21.3600 
1.5000 16.633 24.9495 
> W = 4.8333 > WS = 86.8821 


> WS? = 1,603.38 
(> WS)?/ W = 1,561.77 


Lp Sum of Squares for mating = 41.61 
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The final analysis of variance for your data is in this form (I carried 


mor Jecimals than you did). 


Source of Degrees of Sum of Mean 

Variation Freedom Squares Square 
Litter Size 1 24.32 24.32 
Mating 3 41.61 13.87 
Interaction 3 12.69 4.23 
Error 14 108.23 7.73 


QUERY: hh. three experiments I have observed that treatment 
101 1 produces more than B, but in no case was the difference 

significant. The experiments differed in design so that I cannot 
combine the original observations. Yet I think there should be some 
way to take advantage of the fact tlat in every experiment the proba- 
bility was less than 0.2. It doesn’t seem that this would be likely to 
happen if there were no real difference in yield. 


R. A. Fisher devised a metnod for combining comparable 
probabilities such as you seem to have. See his “Statistical 
Methods for Research ‘Vorkers’’, section 21.1. Bancroft 
and Anderson in their “Statistical Theory in Research”, section 12.6, 
have emphasized the allowable variation in design. 

The method depends on the facts that (i) the sum of several values 
of chi-square is itself distributed as chi-square with the appropriate 
number of degrees of freedom and that (ii) minus twice the natural 
logarithm of a probability is distributed as chi-square with 2 degrees of 
freedom. 

As an example I shall parallel Fisher’s illustration, using common 


logarithms and changing the probabilities to conform to your specifica- 
tion. 


ANSWER: 


P —log P Degrees of Freedom 
0.145 0.8386 2 
0.200 0.6990 2 
0.087 1.0605 2 
Total 2.5981 


x? = 2(2.3026)(2.5981) = 11.965 
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The factor 2.3026 changes the common logarithm to the natural. 

For 6 degrees of freedom, x” = 11.965 corresponds to the probability 
of about 0.062. Thus, the evidence against the null hypothesis is 
greater than that in any of your individual experiments. 


QUERY: I set up a 2° X 4 factorial experiment to study the 
102 effect of light on the feathering of chicks. I planned to use 

the second and third order interactions to estimate error, but 
the four birds in one lot died from causes not associated with the treat- 
ment. Is there any way to analyze the variance of the remaining 
treatments? The data enclosed are percentages of undesirable feathers 
at 10 and 12 weeks of age. 


R. L. Anderson suggested a way to analyze the variance 
ANSWER: of an unreplicated experiment with a missing treatment. 
“Tf high-order interactions are used as estimates of error, 
missing values should be determined by the process of minimizing the 
error variance.”’ Biometrics Bulletin, Volume 2, page 43, 1946. ; 
For illustration, it will be sufficient to use only part of your datd. 
I have chosen the two levels of starting light treatment, s, (10 and 15 
hours per day during the first six weeks), the two dates of killing, k, 
(10 and 12 weeks), and three of the levels of finishing light treatment, 
f, (12, 18 and 24 hours from six weeks to killing). Again, for illustration 
only, I shall assign to the estimate of error not only the 2 degrees of 
freedom for the three-factor interaction, SKF, but also the 2 for the 
interactions of deviations from linear trend with the 2-level treatments, 
SF Q and KF Q- 
The following work sheet contains the data for females, among 
which the missing datum occurred. 
The sum of the four indicated squares is 381.85 — 23.622 + 0.52”. 
This is minimized by z = 23.62. For those who are not familiar with 
the rule for differentiation, use the formula, 


— (coefficient of x) 
~ (coefficient of x”) 


The substitution of z = 23.62 in the sum of squares for error yields 


Sum of Squares for Error = 102.90 
Mean Square (4 — 1 = 3df.) = 34.30 


The degrees of freedom for error are, as usual, decreased by one to 
compensate for the missing datum. 
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The value x = 23.62 is now substituted in the difference for each 
effect not assigned to error and the mean square calculated in the 
regular way. 

Professor Anderson writes as follows: ‘‘As you indicate, the treat- 
ment comparisons will be slightly biased if the usual analysis of variance 
is used with the missing value. One might suggest using the covariance 
technique I used in my Biometrics Bulletin article on ‘Missing Plot 
Techniques.’ But since each comparison has a single degree of freedom, 
it might be easier to compute the variance for each comparison (hence 
adjust the variance instead of the numerator). However even this 
would be so complicated that I doubt if anyone would bother to adjust 
for the bias.” 
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ENAR Joint Meeting of The Biometric Society and The Institute of 
Mathematical Statistics, April 29-May 1, 1953. 


J. EDWARD JACKSON AND ROBERT H. MORRIS. 

212 (Eastman Kodak Company, Rochester, New York). The Ap- 
plication of Multivariate Quality Control to a Photographic 
Problem. 


The use of univariate Quality Control procedures in photographic 
problems fails in many cases because several highly correlated measure- 
ments must be studied simultaneously. The use of Hotelling’s T’ in the 
form T? = (x — x,)S”'(x — 2)’ leads to practical difficulties because the 
determinant of S is usually quite small. However, the use of Principal 
Components in conjunction with the 7’ technique offers a workable 
control tool which, in addition, supplies better photographic measure- 
ments than those previously used. A practical example will be given 
illustrating the use of this technique on an actual process control prob- 
lem. 


H. C. BATSON. (University of Illinois College of Medicine, 
213 Chicago). Factorial Chi-square Analysis of Data from Experi- 
ments in Immunology. ‘ 


The previously unpublished method developed by A. E. Brandt for 
Chi-square analysis of attribute data from factorial experiments is pre- 
sented in sufficient detail to permit employment of the technique by 
others. The method, designated ‘Factorial Chi-square’ by Brandt, is 
essentially a form of analysis of variance employing normalized estimates 
of variance. The error term used in variance ratios is the normalized 
population variance calculated from the totals of the outcome groups in 
each experiment. Factorial coefficients are used directly in calculating 
Chi-square values based on individual degrees of freedom. Applications 
of the method are illustrated by means of examples taken from experi- 
mental immunology. Included among the examples are a 2 X 2 factorial, 
a2 xX 2 X 3 factorial, a 2 X 2 factorial with 4-fold replication and a 
single classification experiment in which the number of individuals per 
experimental unit are unequal. 
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214 IRWIN D. J. BROSS. (Cornell University Medical College). 
Applications of Non-Parametric Methods to Medical Data. 


Four rank order significance tests are briefly described with the aid 
of a Drion diagram (Ann. Math. Stat., 1952, pp. 653-74). Pros and 
cons of the methods as applied to medical data are considered. Three 
illustrations are given of situations where the methods seem especially 
appropriate. In particular it is shown that Drion’s small sample version 
of the Smirnov test can be used to construct self-stopping follow-up 
schemes that would appear to be of value in clinical trials. 


15 DAVID G. KENDALL. (Magdalen College, Oxford and Prince- 
ton University). Stochastic Growth and Mutation Processes. 


The paper presents a simple technique (worked out in collaboration 
with W. A. O’N. Waugh) for dealing with the Bellman-Harris integral 
equation for stochastic growth, when the division-time distribution, 
dG(r), is a weighted convolution of x3-distributions. The method is then 
applied to the formulation and study of a stochastic model of bacterial 
mutation incorporating ‘phenotypic delay”’. 


GEORGE E. P. BOX AND W. A. HAY. (Institute of Statistics, 
Raleigh, N. C. and Imperial Chemical Industries, Manchester, 

216 England). Statistical Designs for the Efficient Removal of 
Trends Occurring in Comparative Experiments with Applications 
in Biological Assay. 


Consider the regression of a quantitative result y, which is measur- 
able but subject to error, on a quantitative factor x, which can be fixed 
by the experimenter at any desired level. For example in biological 
assay y would be biological response and x the dose of drug. A design 
is derived by means of which two such regressions (corresponding for 
example to a test preparation and a standard preparation of a drug) may 
be compared whilst eliminating, without loss of efficiency, a smooth time 
trend in the response y, which occurs due to factors outside the control 
of the experimenter. The experiments are performed in k pairs. In the 
u-th pair a dose d, of the test preparation and a dose ad, of the standard 
preparation is administered, where u = 1, 2, --- , k and @ is a suitably 
chosen constant so that roughly the same response is obtained with test 
and standard preparations. The doses d, are such that d, ,d:,--- , di. 
are orthogonal with the elements ¢, ¢’, --- , ¢’ of a polynomial in time 
fitted to the pair means, and gq and 7 are so chosen that the polynomials 
adequately represent the dose and time effects. In the important special 
case where a linear time trend is adequate the device of angular random- 
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isation (Box, G. E. P. ‘“Multi-factor Designs of First Order”, Biometrika, 
Vol. 39 (1952), pp. 49-57) may be used to select the design. This ensures 
that subsequent normal ‘theory statistical tests applied are exact what- 
ever the distribution of the response and whether the mathematical 
model can be strictly justified or not 

A method is indicated whereby residual time effects within pairs may 
be eliminated using the between pairs trend estimate. 


W. S. CONNOR AND W. H. CLATWORTHY. (National 
217 Bureau of Standards). Necessary Conditioas for the Existence 


of Partially Balanced Incomplete Block Designs with Two 
Associate Classes. 


For a partially belanced incomplete bioch design with two associate 
classes and with parameters v, b, 7, k,n; , 2, Ag, and pj, (2,7, = 1,2), 
the following theorem has been proved: If (*) » > b. then it is necessary 
that (a) A be a perfect square and (b) either r — r, = 0, orr — r, = 0; 
(ii) v = b and v is even, then it is necessary that (a) A be a perfect square 
and (b) r — r, be a perfect square when a, is odd (u = 1, 2); (iii) v = 6, 
v is of the form 4¢ + 3 (¢ = 0, 1, 2, ---), and A is not a perfect square, 
then it is necessary that (r — 7,)(r — r,) be a perfect square, and (iv) 
v < band vis even, then it is necessary that A be a perfect square where 
n= 0A. — + (-)° VA) + A. + = 1, 2), 
Y = Piz — Pi2, A= ¥° + 284+ 1, B = pis + Piz, and a, and a, are 
nonnegative integers such that a, + a, = v — 1. Examples are given of 
sets of parameters which fail to satisfy these conditions. 


A. C. COHEN, JR. (The University of Georgia, Athens). Esti- 
218 mation in Truncated Bivariate Normal Distributions (Preliminary 
Report). 

Maximum likelihood estimators of the parameters of a bivariate 
normal population are developed for samples which are subjected to a 
truncation on one of the variates at known terminals. Both single and 
double truncations with the number of missing (unmeasured) observa- 
tions either known or unknown are considered. Asymptotic variances 
of the estimates are obtained from the likelihood information matrices. 


219 R. F. DRENICK AND P. NESBEDA. (RCA Victor Division, 
Camden, N. J.). On a Class of Optimum Linear Predictors. 


Prediction is the problem of projecting into the future a set of ob- 
served data in order to obtain estimate for future observable data. For 
optimum prediction one assigns, through some considerations which are 
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not part of the method, a loss function representing the penalty for 
error. An optimum prediction procedure is the one which minimizes, 
in the long run, this penalty. N. Wiener pointed out that the optimum 
mean square predictor is linear if the interference affecting the observa- 
tions has gaussian probability distribution. By using a method of 
estimation due to Pitman (Estimation of location and scale parameters. 
Biometrika 30 (1939)) the authors show that the class of linear predictors 
is characterized by the gaussian probability distribution and by a loss 
function more general than rms, namely, one which is symmetric and 
has continuous derivatives. Most of the loss functions of practical 
interest are in this category. Furthermore any such loss function leads 
to the same linear predictor X, which has also the property: P(X, — 
x < k) = max for all k > 0, x being the true value. (Work sponsored 
by the Bureau of Aeronautics.) 


D. B. DUNCAN. (Virginia Polytechnic Institute, Blacksburg). 
220 Multiple Range Tests and the Multiple Comparisons Test 
(Preliminary Report). 

Several methods are available for testing differences between treat- 
ments in an analysis of variance. The two considered most satisfactory 
are one by Newman (1939) and Keuls (1952) and the Multiple Com- 
parisons Test by Duncan (1951). Both employ repeated homogeneity 
tests. The Newman-Keuls test is simpler because it uses repeated range 
tests instead of F tests as used by the Multiple Comparisons Test. The 
latter is generally more sensitive owing partly to this reason but mostly 
to the relaxation of the significance levels of some of the tests considered 
to be of diminished importance. This paper presents: a new Multiple 
Range Test which achieves the simplicity of the Newman-Keuls test by 
using range tests and most of the sensitivity of the Multiple Comparisons 
Test by using the special significance levels, and an improved set of 
application rules for the Multiple Comparisons Test. Each of these is 
recommended for use depending on the relative needs for simplicity or 
sensitivity. The special system of significance levels is discussed in some 
detail. The author is indebted to W. Beyer in the determination of sig- 
nificant ranges for the new test which is still in progress. 


NORMAN LLOYD JOHNSON. (Institute of Statistics, Chapel 
221 _~=—«Hill, N. C.). Sequential Procedures in Component of Variance 
Problems. 


Three types of sequential procedure are discussed. (1) Procedures 
based on the use of sample variance at each stage, (2) procedures based 
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on successive independent groups of samples, and (3) procedures based 
on the isolation of independent degrees of freedom at each stage. 

In cases (2) and (3) approximate formulae for the average sampling 
number (ASN) are evaluated. It is found that (3) gives a lower ASN 
than (2) when the successive groups in (2) are of small size but that (2) 
gives the lower ASN (formally) for larger sized groups. It is conjec- 
tured that the latter (ASN for large group size in (2)) may be related to 
the ASN under procedure (1). 


MARVIN ZI:LEN. (National Bureau of Standards). The 
222 Analysis of Some Incomplete Block Designs with a Missing 
Block. 


During the course of carrying out an experiment the situation may 
arise where all the observations from a particular block are missing. 
This paper outlines the statistical analysis if a block is missing for (a) 
balanced incomplete block designs, (b) partially balanced incomplete 
block designs of the group divisible type, (c) simply linked block designs, 
and (d) several miscellaneous cases which depend on the relationship 
of the treat ments ia the missing block to each other. 


C. 1. BLISS, THEODORE GREINER AND HARRY GOLD. 
223 (Yale University and Cornell University Medical College). 
Estimating the Dose of a Cardiac Glycoside for Human Subjects. 


The threshold dose of digitoxin has been estimated for each of 89 
human subjects, ranging in age from 3 to 74 years. When dosage was 
expressed in micrograms per pound of body weight, children required a 
59% larger dose than adulte. Four partial regression equations have 
been computed from the same data, for children and adults separating 
males and females, in each equation relating the logarithm of the thres- 
hold dose per individual to his log-weight, log-height and age. In every 
case, the regression on age was less than its error. Moreover, the regres- 
sion equations of log-dose upon log-weight and log-height did not differ 
significantly between children and adults in either sex. Combining age 
groups within each sex, log-height contributed no information and the 
regressions on log-weight for males and for females could be fitted by 
parallel lines with a slope of .55, with males requiring 27% more digitoxin 
than females. Adjusting dosage approximately to the square root of the 
body weight should correct adequately for variations in the age and size 
of patients. 
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CLYDE Y. KRAMER AND DAVID B. DUNCAN. (Virginia 
224 Polytechnic Institute, Blacksburg). On the Analysis of Variance 
of a Two-Way Classification with Unequal Subclass Numbers. 


This paper reviews important current methods on the analysis of 
variance of a two-way classification with unequal subclass numbers, 
presenting them from a unified point of view and also proposes a new 
method which has special advantages for particular situations. 

For cases in which no interaction is present, the most efficient pro- 
cedure is the method of fitting constants. The method of weighted 
squares of means is much simpler to apply but would generally sacrifice 
too much efficiency. The virtue of the new method is that it is equally 
simple for such cases and is more efficient than the method of weighted 
squares of means. 

There are situations in which this would not be true and in which 
case the method of weighted squares of means should be used. A quick 
test for recognizing situations of this type is provided. 

The new method tends to give weight to subclass means more in 
proportion to the numbers on which they are based than does the method 
of weighted squares of means. When the number of subclasses are larga 
the time and effort saved by the proposed method may outweigh the loss 
of efficiency. 


ALBERT B. PARKS. (Bureau of Human Nutrition and Home 


225 ‘Economics, U.S.D.A.). Ranking Versus Scoring in Palatability 
Tests Using Small, Trained Panels. 


A design for the simultaneous scoring and ranking of samples in 
palatability tests is presented and illustrated by an experiment with a 
small, trained panel used in the detection of off-flavors. Parallel sets of 
results, in the form of mean scores as compared with estimated treat- 
ment ratings based on ranks, are discussed in terms of validity, consis- 
tency of tests of significance, relative discrimination among treatments, 
and performance of panel members. It is suggested that this design 
may be useful in testing a ranking technique when a series of experiments 
is in progress and a scoring method has been employed. Continuity of 
past results is maintained. 


MAX A. BERSHAD, WILLIAM N. HURWITZ AND RALPH 

226 S. WOODRUFF. (Bureau of the Census). Sampling for Time 
Series. 

This presentation examines a rotation pattern for a sample which is a 

composite of the optimum rotation for month-to-month change and the 
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optimum for totals (or averages) over a time period. In connection with 
this rotation pattern, an estimate, which is a composite of a chained 
estimate and a simple unbiased estimate, is examined. The variances 
of some common statistics are presented for the composite estimate and 
compared with the variances for the chained and unbiased estimates. 
The composite estimate is proved to be the best linear estimate of level 
for any one time period when all past observations, first assumed as 
numerous, are taken into account. The form of the estimate when many 
observations are not available is given and is indicated as approximated 
well by the formula for many observations. 


227 WALTER A. HENDRICKS. (Bureau of Agricultural Econom- 
ics). Response Rates and Selectivity in Mail Surveys. 

Cumulative means per sampling unit from successive mailed returns 
often follow a simple equation of the form, Y = aX°. This provides 2 
simple basis for extrapolating estimates to a 100 percent-return equiva- 
lent. Response rates seem to follow a law identical to relationships 
found in dosage-mortality studies in biology. These relationships seem 
to hold when surveys are conducted on a particular subject in a restricted 
universe. When general-purpose surveys are conducted, the relationship 
between percentage returns and means per sampling unit for individual 
items becomes more complex. In general, there seem to be factors which 
induce a respondent to fill out a mailed questionnaire and other factors 
which work simultaneously in the opposite direction. The net effect 
of these opposing influences governs the response rate to a mailed 
survey. The particular way in which the items to be estimated from 
the survey are correlated with the factors affecting response determines 
the form of the relationship between cumulative means and cumulative 
response rates from repeated mailings. In general-purpose surveys there 
does not seem to be any easily predictable pattern because of the multi- 
plicity of interrelationships. 


French Region, February 25, 1953. 


J. SUTTER AND L. TABAH. Resultats au Test Mosaique 
228 de Gille dans 1.244 Fratries de Deux Enfants et 380 Couples de 
Jumeaux. 

Dans les fratries de deux enfants, aprés réduction du facteur Age, les 
performances des garcons précédés ou suivis d’un gargon, apparaissent 
significativement inférieures 4 celles des gargons précédés ou suivis 
d’une fille. Par contre, les filles ne sont pas influencées par la composition 
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de la fratrie. I] existe une corrélation positive trés significative entre 
les performances au test dans les fratries et la durée de l’intervalle 
intergénésique. Le rang de naissance n’exerce pas d’influence dans les 
diverses catégories de fratries, sauf dans les fratries de deux garcons ou 
deux filles, urbaines et cont l’intervalle intergénésique est faible; les 
premiers nés apparaissent alors inférieurs aux deuxiémes nés. Le coeffi- 
cient de corrélation intra-classe apparaft de l’ordre de 0,5 dans toutes les 
catégories de fratries. 

Parmi les couples de jumeaux, dont il n’a pas été possible de distin- 
guer les monozygotes des dizygotes, les performances apparaissent in- 
férieures 4 celles des non jumeaux. Les couples de gargons sont infé- 
rieurs aux couples de filles. Les coefficients de corrélation sont de 0,842 


chez les garcons, 0,823 chez les filles et 0,781 dans les couples hétéro- 
genes en sexe. 


British Region, February 26, 1953. 


KATHERINE H. COWARD. The Use of the Logistic Curve 
in Bio-Assay. 


The logistic curve best fitting the data may be determined by 
strating from trial estimates of the ceiling and base values, and then 
modifying these values until the average of the ratios of the doses corre- 
sponding to successive Y-values most nearly approaches the known 
ratio of the doses g.ven. Test and Standard can be worked out side by 
side. 

The potency of the Test Substance in terms of Standard is given by 
(a) the number of units of Standard and (b) the weight of Test Sub- 
stance corresponding to one standard interval of the curve. (Ref: 
Emmens, C. W.: J. Endocrinology, 1940, 2, 194). 


230 J.O. IRWIN. On the ‘Transition Probabilities’ Corresponding 
to Any Accident Distribution. 


From any known distribution of accidents in a fixed exposure time T’, 
the expected number of other accidents sustained by a person who has 
x accidents is obtained. It is the ratio of the z + 1-th to the z-th 
factorial moment of the distribution. Its limiting value when the expo- 
sure time tends to zero gives the transition probabilities. (To appear in 
J. Roy. Statist. Soc. B.) 
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J. A. FRASER ROBERTS. The Use of Regressions Involving 
231 Variances of Independent Variates for Calculating Age-Corrected 
Scores. 


It is often useful to calculate once and for all an age-corrected score 
for each individual. If there is a substantial change of variance with age 
as well as of mean, the scores can be adjusted by using the regression of 
the variance of the measurement on age (weighting the variance in each 
array by the number in the array). The fit of the regression may be 
tested by a method due to R. A. Fisher. 

The mean square, derived by dividing the weighted sum of squares 
of deviations from the regression line by k — r, where k is the number 
of arrays and r — 1 the order of the polynomial is: 


n,(Y> ») /(k r) (1) 


The theoretical sampling variance of an estimate y of a true variance Y, 
based on a sample of n, is 2Y’/n. (1) therefore estimates approximately 
the mean value of 2Y’, taken over the k arrays, namely 2 >> Y3/k. The 
approximation, which should lead to little discrepancy, is due to the 
fact that the observed variances are weighted only according to the 
number of observations and not according to estimates of their true 
values. Thus the mean square due to linear regression is compared with 
twice the square of the mean variance, with degrees of freedom 1 and ~; 
the extra mean square due to the quadratic term with 2/k times the 
sum of squares of the expected variances given by the linear function, 
etc.; the remainder with 2/k times the sum of squares of the expected 
variances given by the highest polynomial fitted, with degrees of free- 
dom (k — r) and ~. 

Application of the method to the Terman-Merrill revision of the 
Binet Scale removes the great bulk of the very large and troublesome 
residual association between age and I. Q. 

Mean blood-pressure grows steadily ‘nendhent life and variance 
increases more than fivefold. In a genetic investigation it is essential 
to be able to compare and add in various ways persons of very different 
ages, for example, the parents, the sibs and the children of a sample of 
subjects. Adjustment for variance as well as for mean has provided 
individual scores which are almost entirely independent of age. (See 
J. A. F. Roberts and M. A. Mellone. British Journal of Psychology, 
Statistical Section, 5, 65, 1952; G. W. Pickering, G. S. C. Sowry, M. 
Hamilton and J. A. F. Roberts. Clinical Science, in preparation.) 
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: THE THIRD INTERNATIONAL BIOMETRIC CONFERENCE 
Preliminary Programme 
Lake of Como, Bellagio, Italy—September 1-5, 1953 


1st September 


9:00 A.M. Inauguration: Welcoming address and Presidential ad- 
dress. 


op 10:00 A.M. The first course in biometry—a symposium (Chairman: 
W. G. Cochran) 

L. Martin. Enseignement des principes d’expérimentation 
des méthodes statistiques 4 des biologistes dans deux 
éstablissements belges d’enseignement supérieur. 

G. Barbensi. L’insegnamento della Biometria. 

C. I. Bliss. A course in Biometry for graduate students in 
Biology. 

By Title: A. Vessereau. Enseignement des methodes 
statistiques appliquées 4 la Biometrie. 


12:30 P.M. First general business meeting. 


3:00 P.M. Mathematical problems in Genetics (Chairman: A. Buzzati- 
Traverso) 
Sir Ronald Fisher. The variability in the length of germ 
7 plasm still heterogeneous after a given amount of 
inbreeding. 
K. Mather. The methodology of Biometrical Genetics. 
A. R. G. Owen. Experimental designs in Genetics. 
: D. Lowry. Variance components with reference to genetic 
population parameters. 
C. A. B. Smith. The calculation of correlation between 
cousins. 


2nd September. 


i! 9:00 A.M. Methodological Problems in Biometry (Chairman: Gertrude 

M. Cox) 

| J. W. Hopkins. Some needed significance tests. 

1 F. Anscombe. Fixed-sample-size analysis of sequential 
observations. 
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3:00 P.M. 


2:30 P.M. 
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W.G. Cochran. The combination of estimates from differ- 
ent experiments. 
N. Blomqvist. Rank analysis of incomplete block designs. 
M. Keuls. Testing differences between means in an analy- 
sis of variance. 
M. J. R. Healy. Decision between two alternatives: how 
many experiments? 
By Title: L. Martin. Suggestions for longitudinal data 
in gerontology. 
J. Hemelrijk and J. H. B. Kemperman. Use of 
a sequential Wilcoxon-test for diagnostic 
purposes. 


Biometry in Immunology (Chairman: H. C. Batson) 

R. Prigge. Die Anwendung der Mutungsbereiche in der 
Immunitaetsforschung. 

I. Ipsen. Factors of dosage and host determining antibody 
response to secondary antigen stimulus. 

L. B. Holt. Quantitative studies in diphtheria prophy- 
laxis—an attempt to derive a mathematical charac- 
terisation of the antigenicity of diphtheria prophy- 
lactics. 

S. Peto. A dose response equation for the invasion of 
microorganisms. 


3rd September. 
9:00 A.M. 


Biometric methods in Agriculture (Chairman: P. V. Suk- 
hatme) 

F. Yates. The place of simple experiments on cultivator’s 
fields in agricultural development. 

V. G. Panse. Principles of the survey method of experi- 
mentation. 

T. N. Hoblyn and S. C. Pearce.. Biometrical problems in 
research on long-lived plants. 

H. Strecker and J. Raab. Methodological problems in a 
survey of milk production of hog breeders in Germany. 

G. Rasch. On different sources of errors and the advantage 
of their knowledge in planning experiments. 

J. L. Lush. Estimating heritabilities. 

M. Matemura. Designs for agricultural research in Japan. 


Excursion. 
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9:00 A.M. 


12:00 A.M. 
3:00 P.M. 


8:00 P.M. 
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4th September. 


Functional relations in experimentation (Chairman: H. 
Wold) 

D. J. Finney. Functional relationships in experimentation. 

J. Berkson. Minimum chi square and maximum likelihood 
estimates of regression coefficients. 

J. Neyman. Frisch’s problem on linear structural rela- 
tions. 


Biometrics business meeting. 


Contributed papers (Chairman: M. J. R. Healy) 
A. F. Parker-Rhodes. Estimating populations of irregu- 
larly observable organisms. 
D. W. Goodall. Factor analysis in Plant Sociology. 
E. F. Scott. Bivariate contagious distributions. 
G. Karreman. The mathematical biology threshold and 
related phenomena in excitation. 
By Title: M.W. Bentzon. On the statistical evaluation 
of dose response curves in case the dose 
intervals are large. 


Social dinner. 


September. 


9:00 A.M. 


11:30 A.M. 
12:00 A.M. 


Industrial applications of Biometry (Chairman: A. Linder) 

E. A. G. Knowles. Applications of experimental designs in 
industry. 

D. R. Read. The design of chemical experiments. 

H. C. Hamaker. Experimental designs in industry: a 
discussion. 


General business session. 


Closing of the Conference. 
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THE BIOMETRIC SOCIETY 


Indian Members. The Indian group sponsored jointly with the 
Indian Society of Agricultural Statistics a symposium in New Delhi 
on the 26th of February on ‘New Problems in Experimentation.” 
Frank Yates presided and the speakers included D. J. Finney, K. R. 
Nair, V. G. Panse and others. 

British Region. The British Region held a joint symposium at the 
Wellcome Research Institution in London on March 11th on “Flavour 
Assessment” in collaboration with the Society of Chemical Industry 
(Food Group) and the Society of Public Analysts (Biological Methods 
Group). The program consisted of papers by E. D. Adrian on The 
physiological background of flavour assessment, by H. G. Harvey on 
Basic considerations in regard to fiavour assessment, by A. S. C. Ehren- 
burg and J. M. Shewan on The objective approach to sensory tests, by 
J. M. Harries on Sensory tests and consumer acceptance, and by 
J. O. Irwin on A biometrician’s viewpoint. A more complete account 
of the meeting has been published in Nature for April 25, 1953. 

Italian Region. The ‘Third Italian Meeting was held at the Uni- 
versity of Florence on Aprii 2nd. In the morning session on ‘Genetic 
Selection” papers were presented by I. M. Lerner, currently a guest at 
the University of Pavia, on The problem of quantification of selection 
methods, and by R. Scossiroli, also of Pavia, on Statistical analysis of 
selection experiments in Drosophila populations treated with X-rays. 
The afternoon program on ‘General Methodology” offered lectures by 
F. Brambilla, of the University of Genoa and the University Bacconi, 
Milan, on Confluence analysis, and by G. Barbensi on Criticisms on the 
use of chi-square in the fourfold table. At the annual business meeting, 
which followed the morning scientific program, L. L. Cavalli reported on 
the activities of the year. Statutes for the Region were discussed and 
approved. It was agreed to send a letter over the signatures of several 
university professors in medical and biological faculties recommending 
that these faculties provide teaching in statistics. This letter is to be 
given as wide circulation as possible. The Region will sponsor a 
duplicated bulletin of methodological papers of general interest, with 
emphasis on the popularization of statistical methods. It will be sent 
without charge to each member. Through the initiative of Professor 
Brambilla, a Methodological Section has been formed which holds 
weekly seminars at the Department of Statistics of the University 
Bacconi of Milan, 
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ENAR. On April 29th to May Ist the Eastern North American 
Region met jointly with the Institute of Mathematical Statistics in 
Washington, D. C. The opening session on April 29th considered 
“Statistics in the Physical Sciences’”’ with papers by J. E. Jackson and 
R. H. Morris on The application of multivariate quality control to a 
photographic problem, and by J. M. Cameron on Control and measure- 
ment of experimental error. On Thursday “Statistics in the Biological 
Sciences” provided the morning program, with papers by H. C. Batson 
on Factorial chi-square analysis of data from experiments in immunol- 
ogy, by I. D. J. Bross on Applications of nonparametric methods to 
medical data, by D. G. Kendall on Stochastic growth and mutation 
processes, and by G. E. P. Box and W. A. Hay on Statistical designs for 
the efficient removal of trends occurring in comparative experiments with 
applications in biological assay. The afternoon program offered papers 
by W. H. Clatworthy and W. S. Connor on Necessary conditions for the 
existence of partially balanced incomplete block designs with two associ- 
ate classes, by A. C. Cohen, Jr. on Estimation in truncated bivariate 
normal distributions, by R. F. Drenick and P. Nesbeda on A clas# of 
optimum linear predictors, by D. B. Duncan on Multiple range tests 
and the multiple comparisons test, by N. L. Johnson on Sequential 
procedures in component of variance problems, and by M. C. K. 
Tweedie on Sequential estimation. On Friday papers were given by 
M. Zelen on The analysis of some incomplete block designs with a missing 
block, by A. W. Kimball on The fitting of multi-hit survival curves, by 
C. I. Bliss, T. Greiner and H. Gold on Estimating the dose of a cardiac 
glycoside for human subjects, by C. Y. Kramer On the analysis of vari- 
ance of a two-way classification with unequal sub-class numbers, by 
A. B. Parks on Ranking versus scoring in palatability tests using small, 
trained panels, by M. A. Bershad, W. N. Hurwitz and R. S. Woodruff 
on Sampling for time series, by W. A. Hendricks on Response rates and 
selectivity in mail surveys, by R. C. Bose and 8. N. Roy on Simultaneous 
confidence interval estimation and testing of hypotheses. 

Japanese Members. Largely through the efforts of Mr. M. Hata- 
mura of the National Institute of Agricultural Sciences in Tokyo, the 
Society has at this date 25 members in Japan. Mr. Hatamura has agreed 
to serve provisionally as National Secretary. He reports that a meeting 
is planned in the near future to develop a program for Japan. 
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NOTES 


SumMMER STATISTICAL SEMINAR 


University of Connecticut, 1953 


The Summer Seminar in Statistics will meet for the fourth year at the 
University of Connecticut during the three weeks of August 10 through 
28, 1953. The general pattern of programs is to be the same as in previ- 
ous years. Informality and discussion will be stressed. There will be one 
or two seminar sessions each day and a clinic on the treatment of prob- 
lems in application. 

The first week, August 10 through 14 will be devoted to Statistical 
Methodology in Physics. It is being organized by Dr. E. W. Pike, 
Equipment Engineering Division, Raytheon Manufacturing Company, 
Newton 58, Mass., together with Dr. Churchill Eisenhart and Dr. E. P. 
King. 

The second week, August 17 through 21 will be devoted to Statistics 
in Biometry and Medicine. it is being organized by Professor G. Beall, 
Statistical Laboratory, University of Connecticut, Storrs, Conn., to- 
gether with Dr. I. Bross and Dr. D. Mainland. 

The third week, August 24 through 28 will be broken into two parts. 
The first will be devoted to the ASA Handbook, as organized by Professor 
F. Mosteller, Laboratory of Social Relations, Harvard University, Cam- 
bridge 38, Mass. The second part will be devoted to Performance and 
Reliability of Complex Mechanical Assemblies, as organized by Pro- 
fessor G. H. Shortley, The Johns Hopkins University, 6410 Connecticut 
Avenue, Chevy Chase, Maryland. 

Anyone interested in the subjects under discussion is invited to 
attend for the day, week, or other period. (A nominal registration fee 
will be collected). 

For further discussion please write the Secretary of the Seminar, 
Professor Geoffrey Beall, Statistical Laboratory, University of Connecti- 
cut, Storrs, Connecticut. Information will be provided on lodging and 
meals. A detailed outline of the program will be furnished. In addition 
to supplying information, the Secretary will welcome suggestions of 
topics for the Seminar or Clinic sessions. 
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INTERNATIONAL CONGRESS OF MATHEMATICIANS 1954 


First Communication 


In the final plenary session of the International Congress of Mathe- 
maticians, 1950, held in Cambridge (Mass.) the Congress accepted the 
invitation of the delegation from the Netherlands to hold the next 
Congress in the Netherlands. 

The International Congress of Mathematicians 1954 will be held in 
Amsterdam from September 2nd to September 9th under the auspices of 
“Het Wiskundig Genootschap” (The Mathematical Society of the 
Netherlands). It is the sincere hope of the “Wiskundig Genootschap”’ 
that the Congress 1954, which will be open to all mathematicians from 
all parts of the world, will be a fertile international gathering. 

The Organizing Committee has invited a number of outstanding 
mathematicians to deliver one-hour addresses, hoping that in this way a 
survey of the recent development in the whole field of mathematics may 
be furnished. 

There will be seven sections, viz: (1) Algebra and Theory of Num- 
bers, (2) Analysis, (3) Geometry and Topology, (4) Probabflity and 
Statistics, (5) Mathematical Physics and Applied Mathematics, (6) 


Logic and Foundations, and (7) Philosophy, History and Education. 


2 
> 
| 
\ 
| 


Pees 
“ig 


